Merge branch 'main' into main

2025-11-24 06:26:03 +00:00 · 2025-10-30 11:19:49 +05:30 · 2025-10-30 11:19:49 +05:30 · 91b2355c9a
commit 91b2355c9a
parent f36b719a24 c0d2f624c0
9 changed files with 60 additions and 18 deletions
--- a/.github/workflows/python-publish.yml
+++ b/.github/workflows/python-publish.yml
@ -17,11 +17,11 @@ jobs:
    - name: Set up Python
      uses: actions/setup-python@v5
      with:
-        python-version: '3.8'
+        python-version: '3.12'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
-        pip install setuptools wheel twine
+        pip install setuptools wheel twine build
    - name: Release
      if: ${{ steps.regex-match.outputs.match != '' }}
      uses: softprops/action-gh-release@v2
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,26 @@
 # CHANGELOG
 ## [v20250625](https://github.com/openai/whisper/releases/tag/v20250625)
 * Fix: Update torch.load to use weights_only=True to prevent security w… ([#2451](https://github.com/openai/whisper/pull/2451))
 * Fix: Ensure DTW cost tensor is on the same device as input tensor ([#2561](https://github.com/openai/whisper/pull/2561))
 * docs: updated README to specify translation model limitation ([#2547](https://github.com/openai/whisper/pull/2547))
 * Fixed triton kernel update to support latest triton versions ([#2588](https://github.com/openai/whisper/pull/2588))
 * Fix: GitHub display errors for Jupyter notebooks ([#2589](https://github.com/openai/whisper/pull/2589))
 * Bump the github-actions group with 3 updates ([#2592](https://github.com/openai/whisper/pull/2592))
 * Keep GitHub Actions up to date with GitHub's Dependabot ([#2486](https://github.com/openai/whisper/pull/2486))
 * pre-commit: Upgrade black v25.1.0 and isort v6.0.0 ([#2514](https://github.com/openai/whisper/pull/2514))
 * GitHub Actions: Add Python 3.13 to the testing ([#2487](https://github.com/openai/whisper/pull/2487))
 * PEP 621: Migrate from setup.py to pyproject.toml ([#2435](https://github.com/openai/whisper/pull/2435))
 * pre-commit autoupdate && pre-commit run --all-files ([#2484](https://github.com/openai/whisper/pull/2484))
 * Upgrade GitHub Actions ([#2430](https://github.com/openai/whisper/pull/2430))
 * Bugfix: Illogical "Avoid computing higher temperatures on no_speech" ([#1903](https://github.com/openai/whisper/pull/1903))
 * Updating README and doc strings to reflect that n_mels can now be 128 ([#2049](https://github.com/openai/whisper/pull/2049))
 * fix typo data/README.md ([#2433](https://github.com/openai/whisper/pull/2433))
 * Update README.md ([#2379](https://github.com/openai/whisper/pull/2379))
 * Add option to carry initial_prompt with the sliding window ([#2343](https://github.com/openai/whisper/pull/2343))
 * more pytorch versions in tests ([#2408](https://github.com/openai/whisper/pull/2408))
 ## [v20240930](https://github.com/openai/whisper/releases/tag/v20240930)
 * allowing numpy 2 in tests ([#2362](https://github.com/openai/whisper/pull/2362))
--- a/README.md
+++ b/README.md
@ -77,25 +77,35 @@ Whisper's performance varies widely depending on the language. The figure below
 ![WER breakdown by language](https://github.com/openai/whisper/assets/266841/f4619d66-1058-4005-8f67-a9d811b77c62)
 ## Command-line usage
 The following command will transcribe speech in audio files, using the `turbo` model:
-    whisper audio.flac audio.mp3 audio.wav --model turbo
+```bash
 whisper audio.flac audio.mp3 audio.wav --model turbo
 ```
-The default setting (which selects the `turbo` model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the `--language` option:
+The default setting (which selects the `turbo` model) works well for transcribing English. However, **the `turbo` model is not trained for translation tasks**. If you need to **translate non-English speech into English**, use one of the **multilingual models** (`tiny`, `base`, `small`, `medium`, `large`) instead of `turbo`. 
-    whisper japanese.wav --language Japanese
+For example, to transcribe an audio file containing non-English speech, you can specify the language:
-Adding `--task translate` will translate the speech into English:
+```bash
 whisper japanese.wav --language Japanese
 ```
-    whisper japanese.wav --language Japanese --task translate
+To **translate** speech into English, use:
 ```bash
 whisper japanese.wav --model medium --language Japanese --task translate
 ```
 > **Note:** The `turbo` model will return the original language even if `--task translate` is specified. Use `medium` or `large` for the best translation results.
 Run the following to view all available options:
-    whisper --help
+```bash
 whisper --help
 ```
 See [tokenizer.py](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) for the list of all available languages.
--- a/notebooks/LibriSpeech.ipynb
+++ b/notebooks/LibriSpeech.ipynb
@ -949,7 +949,8 @@
      "style": "IPY_MODEL_039b53f2702c4179af7e0548018d0588",
      "value": " 164/164 [05:08&lt;00:00,  1.86s/it]"
     }
-    }
+    },
    "state": {}
   }
  }
 },
--- a/notebooks/Multilingual_ASR.ipynb
+++ b/notebooks/Multilingual_ASR.ipynb
@ -4219,7 +4219,8 @@
            "_view_name": "StyleView",
            "description_width": ""
          }
-        }
+        },
        "state": {}
      }
    }
  },
--- a/whisper/init.py
+++ b/whisper/init.py
@ -147,7 +147,8 @@ def load_model(
    with (
        io.BytesIO(checkpoint_file) if in_memory else open(checkpoint_file, "rb")
    ) as fp:
-        checkpoint = torch.load(fp, map_location=device)
+        kwargs = {"weights_only": True} if torch.__version__ >= "1.13" else {}
        checkpoint = torch.load(fp, map_location=device, **kwargs)
    del checkpoint_file
    dims = ModelDimensions(**checkpoint["dims"])
--- a/whisper/timing.py
+++ b/whisper/timing.py
@ -117,7 +117,7 @@ def dtw_cuda(x, BLOCK_SIZE=1024):
    x_skew = x_skew.T.contiguous()
    cost = torch.ones(N + M + 2, M + 2) * np.inf
    cost[0, 0] = 0
-    cost = cost.cuda()
+    cost = cost.to(x.device)
    trace = torch.zeros_like(cost, dtype=torch.int32)
    dtw_kernel[(1,)](
--- a/whisper/triton_ops.py
+++ b/whisper/triton_ops.py
@ -60,7 +60,7 @@ def median_kernel(filter_width: int):
        tl.store(y_ptr + offsets, MIDDLE_ROW_HERE, mask=mask)  # noqa: F821
    kernel = triton.JITFunction(kernel.fn)
-    kernel.src = kernel.src.replace(
+    new_kernel = kernel.src.replace(
        "    LOAD_ALL_ROWS_HERE",
        "\n".join(
            [
@ -69,7 +69,8 @@ def median_kernel(filter_width: int):
            ]
        ),
    )
-    kernel.src = kernel.src.replace(
+
    new_kernel = new_kernel.replace(
        "    BUBBLESORT_HERE",
        "\n\n".join(
            [
@ -90,7 +91,14 @@ def median_kernel(filter_width: int):
            ]
        ),
    )
-    kernel.src = kernel.src.replace("MIDDLE_ROW_HERE", f"row{filter_width // 2}")
+
    new_kernel = new_kernel.replace("MIDDLE_ROW_HERE", f"row{filter_width // 2}")
    if hasattr(kernel, "_unsafe_update_src") is True:
        kernel._unsafe_update_src(new_kernel)
        kernel.hash = None
    else:
        kernel.src = new_kernel
    return kernel
--- a/whisper/version.py
+++ b/whisper/version.py
@ -1 +1 @@
-__version__ = "20240930"
+__version__ = "20250625"
`@ -1 +1 @@`
	`__version__ = "20240930"`	`__version__ = "20250625"`