mirror of
https://github.com/openai/whisper.git
synced 2025-11-24 06:26:03 +00:00
Merge branch 'main' into main
This commit is contained in:
commit
91b2355c9a
4
.github/workflows/python-publish.yml
vendored
4
.github/workflows/python-publish.yml
vendored
@ -17,11 +17,11 @@ jobs:
|
|||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
uses: actions/setup-python@v5
|
uses: actions/setup-python@v5
|
||||||
with:
|
with:
|
||||||
python-version: '3.8'
|
python-version: '3.12'
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
python -m pip install --upgrade pip
|
python -m pip install --upgrade pip
|
||||||
pip install setuptools wheel twine
|
pip install setuptools wheel twine build
|
||||||
- name: Release
|
- name: Release
|
||||||
if: ${{ steps.regex-match.outputs.match != '' }}
|
if: ${{ steps.regex-match.outputs.match != '' }}
|
||||||
uses: softprops/action-gh-release@v2
|
uses: softprops/action-gh-release@v2
|
||||||
|
|||||||
21
CHANGELOG.md
21
CHANGELOG.md
@ -1,5 +1,26 @@
|
|||||||
# CHANGELOG
|
# CHANGELOG
|
||||||
|
|
||||||
|
## [v20250625](https://github.com/openai/whisper/releases/tag/v20250625)
|
||||||
|
|
||||||
|
* Fix: Update torch.load to use weights_only=True to prevent security w… ([#2451](https://github.com/openai/whisper/pull/2451))
|
||||||
|
* Fix: Ensure DTW cost tensor is on the same device as input tensor ([#2561](https://github.com/openai/whisper/pull/2561))
|
||||||
|
* docs: updated README to specify translation model limitation ([#2547](https://github.com/openai/whisper/pull/2547))
|
||||||
|
* Fixed triton kernel update to support latest triton versions ([#2588](https://github.com/openai/whisper/pull/2588))
|
||||||
|
* Fix: GitHub display errors for Jupyter notebooks ([#2589](https://github.com/openai/whisper/pull/2589))
|
||||||
|
* Bump the github-actions group with 3 updates ([#2592](https://github.com/openai/whisper/pull/2592))
|
||||||
|
* Keep GitHub Actions up to date with GitHub's Dependabot ([#2486](https://github.com/openai/whisper/pull/2486))
|
||||||
|
* pre-commit: Upgrade black v25.1.0 and isort v6.0.0 ([#2514](https://github.com/openai/whisper/pull/2514))
|
||||||
|
* GitHub Actions: Add Python 3.13 to the testing ([#2487](https://github.com/openai/whisper/pull/2487))
|
||||||
|
* PEP 621: Migrate from setup.py to pyproject.toml ([#2435](https://github.com/openai/whisper/pull/2435))
|
||||||
|
* pre-commit autoupdate && pre-commit run --all-files ([#2484](https://github.com/openai/whisper/pull/2484))
|
||||||
|
* Upgrade GitHub Actions ([#2430](https://github.com/openai/whisper/pull/2430))
|
||||||
|
* Bugfix: Illogical "Avoid computing higher temperatures on no_speech" ([#1903](https://github.com/openai/whisper/pull/1903))
|
||||||
|
* Updating README and doc strings to reflect that n_mels can now be 128 ([#2049](https://github.com/openai/whisper/pull/2049))
|
||||||
|
* fix typo data/README.md ([#2433](https://github.com/openai/whisper/pull/2433))
|
||||||
|
* Update README.md ([#2379](https://github.com/openai/whisper/pull/2379))
|
||||||
|
* Add option to carry initial_prompt with the sliding window ([#2343](https://github.com/openai/whisper/pull/2343))
|
||||||
|
* more pytorch versions in tests ([#2408](https://github.com/openai/whisper/pull/2408))
|
||||||
|
|
||||||
## [v20240930](https://github.com/openai/whisper/releases/tag/v20240930)
|
## [v20240930](https://github.com/openai/whisper/releases/tag/v20240930)
|
||||||
|
|
||||||
* allowing numpy 2 in tests ([#2362](https://github.com/openai/whisper/pull/2362))
|
* allowing numpy 2 in tests ([#2362](https://github.com/openai/whisper/pull/2362))
|
||||||
|
|||||||
26
README.md
26
README.md
@ -77,25 +77,35 @@ Whisper's performance varies widely depending on the language. The figure below
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Command-line usage
|
## Command-line usage
|
||||||
|
|
||||||
The following command will transcribe speech in audio files, using the `turbo` model:
|
The following command will transcribe speech in audio files, using the `turbo` model:
|
||||||
|
|
||||||
whisper audio.flac audio.mp3 audio.wav --model turbo
|
```bash
|
||||||
|
whisper audio.flac audio.mp3 audio.wav --model turbo
|
||||||
|
```
|
||||||
|
|
||||||
The default setting (which selects the `turbo` model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the `--language` option:
|
The default setting (which selects the `turbo` model) works well for transcribing English. However, **the `turbo` model is not trained for translation tasks**. If you need to **translate non-English speech into English**, use one of the **multilingual models** (`tiny`, `base`, `small`, `medium`, `large`) instead of `turbo`.
|
||||||
|
|
||||||
whisper japanese.wav --language Japanese
|
For example, to transcribe an audio file containing non-English speech, you can specify the language:
|
||||||
|
|
||||||
Adding `--task translate` will translate the speech into English:
|
```bash
|
||||||
|
whisper japanese.wav --language Japanese
|
||||||
|
```
|
||||||
|
|
||||||
whisper japanese.wav --language Japanese --task translate
|
To **translate** speech into English, use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
whisper japanese.wav --model medium --language Japanese --task translate
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Note:** The `turbo` model will return the original language even if `--task translate` is specified. Use `medium` or `large` for the best translation results.
|
||||||
|
|
||||||
Run the following to view all available options:
|
Run the following to view all available options:
|
||||||
|
|
||||||
whisper --help
|
```bash
|
||||||
|
whisper --help
|
||||||
|
```
|
||||||
|
|
||||||
See [tokenizer.py](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) for the list of all available languages.
|
See [tokenizer.py](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) for the list of all available languages.
|
||||||
|
|
||||||
|
|||||||
3
notebooks/LibriSpeech.ipynb
generated
3
notebooks/LibriSpeech.ipynb
generated
@ -949,7 +949,8 @@
|
|||||||
"style": "IPY_MODEL_039b53f2702c4179af7e0548018d0588",
|
"style": "IPY_MODEL_039b53f2702c4179af7e0548018d0588",
|
||||||
"value": " 164/164 [05:08<00:00, 1.86s/it]"
|
"value": " 164/164 [05:08<00:00, 1.86s/it]"
|
||||||
}
|
}
|
||||||
}
|
},
|
||||||
|
"state": {}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|||||||
3
notebooks/Multilingual_ASR.ipynb
generated
3
notebooks/Multilingual_ASR.ipynb
generated
@ -4219,7 +4219,8 @@
|
|||||||
"_view_name": "StyleView",
|
"_view_name": "StyleView",
|
||||||
"description_width": ""
|
"description_width": ""
|
||||||
}
|
}
|
||||||
}
|
},
|
||||||
|
"state": {}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|||||||
@ -147,7 +147,8 @@ def load_model(
|
|||||||
with (
|
with (
|
||||||
io.BytesIO(checkpoint_file) if in_memory else open(checkpoint_file, "rb")
|
io.BytesIO(checkpoint_file) if in_memory else open(checkpoint_file, "rb")
|
||||||
) as fp:
|
) as fp:
|
||||||
checkpoint = torch.load(fp, map_location=device)
|
kwargs = {"weights_only": True} if torch.__version__ >= "1.13" else {}
|
||||||
|
checkpoint = torch.load(fp, map_location=device, **kwargs)
|
||||||
del checkpoint_file
|
del checkpoint_file
|
||||||
|
|
||||||
dims = ModelDimensions(**checkpoint["dims"])
|
dims = ModelDimensions(**checkpoint["dims"])
|
||||||
|
|||||||
@ -117,7 +117,7 @@ def dtw_cuda(x, BLOCK_SIZE=1024):
|
|||||||
x_skew = x_skew.T.contiguous()
|
x_skew = x_skew.T.contiguous()
|
||||||
cost = torch.ones(N + M + 2, M + 2) * np.inf
|
cost = torch.ones(N + M + 2, M + 2) * np.inf
|
||||||
cost[0, 0] = 0
|
cost[0, 0] = 0
|
||||||
cost = cost.cuda()
|
cost = cost.to(x.device)
|
||||||
trace = torch.zeros_like(cost, dtype=torch.int32)
|
trace = torch.zeros_like(cost, dtype=torch.int32)
|
||||||
|
|
||||||
dtw_kernel[(1,)](
|
dtw_kernel[(1,)](
|
||||||
|
|||||||
@ -60,7 +60,7 @@ def median_kernel(filter_width: int):
|
|||||||
tl.store(y_ptr + offsets, MIDDLE_ROW_HERE, mask=mask) # noqa: F821
|
tl.store(y_ptr + offsets, MIDDLE_ROW_HERE, mask=mask) # noqa: F821
|
||||||
|
|
||||||
kernel = triton.JITFunction(kernel.fn)
|
kernel = triton.JITFunction(kernel.fn)
|
||||||
kernel.src = kernel.src.replace(
|
new_kernel = kernel.src.replace(
|
||||||
" LOAD_ALL_ROWS_HERE",
|
" LOAD_ALL_ROWS_HERE",
|
||||||
"\n".join(
|
"\n".join(
|
||||||
[
|
[
|
||||||
@ -69,7 +69,8 @@ def median_kernel(filter_width: int):
|
|||||||
]
|
]
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
kernel.src = kernel.src.replace(
|
|
||||||
|
new_kernel = new_kernel.replace(
|
||||||
" BUBBLESORT_HERE",
|
" BUBBLESORT_HERE",
|
||||||
"\n\n".join(
|
"\n\n".join(
|
||||||
[
|
[
|
||||||
@ -90,7 +91,14 @@ def median_kernel(filter_width: int):
|
|||||||
]
|
]
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
kernel.src = kernel.src.replace("MIDDLE_ROW_HERE", f"row{filter_width // 2}")
|
|
||||||
|
new_kernel = new_kernel.replace("MIDDLE_ROW_HERE", f"row{filter_width // 2}")
|
||||||
|
|
||||||
|
if hasattr(kernel, "_unsafe_update_src") is True:
|
||||||
|
kernel._unsafe_update_src(new_kernel)
|
||||||
|
kernel.hash = None
|
||||||
|
else:
|
||||||
|
kernel.src = new_kernel
|
||||||
|
|
||||||
return kernel
|
return kernel
|
||||||
|
|
||||||
|
|||||||
@ -1 +1 @@
|
|||||||
__version__ = "20240930"
|
__version__ = "20250625"
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user