whisper

mirror of https://github.com/openai/whisper.git synced 2025-03-30 14:28:27 +00:00

Author	SHA1	Message	Date
Oliver Cai	d104c8eccd	Merge ee36decb1a86f4b2d76967098ca589d5332dadf7 into 517a43ecd132a2089d85f4ebc044728a71d49f6e	2025-01-12 07:11:32 +01:00
Jong Wook Kim	517a43ecd1	Update python-publish.yml using `-m build --sdist` instead of `setup.py sdist`	2025-01-04 12:56:16 -08:00
Christian Clauss	dd4d010d2c	PEP 621: Migrate from setup.py to pyproject.toml (#2435 )	2025-01-04 01:38:35 -08:00
Christian Clauss	26a7cacc83	pre-commit autoupdate && pre-commit run --all-files (#2484 ) * pre-commit autoupdate && pre-commit run --all-files * Black formatter needs a current version of Python	2025-01-04 01:02:18 -08:00
Christian Clauss	6c1d8f1ea1	Upgrade GitHub Actions (#2430 )	2025-01-04 00:47:12 -08:00
Purfview	90db0de189	Bugfix: Illogical "Avoid computing higher temperatures on no_speech" (#1903 ) * Bugfix: Illogical "Avoid computing higher temperatures on no_speech" Bugfix for https://github.com/openai/whisper/pull/1279 It's "silence" when decoding has failed due to `compression_ratio_threshold` too, when further down the code it's not "silence" anymore. "Silence" should be only when decoding has failed due to `logprob_threshold`. Like described there: `8bc8860694/whisper/transcribe.py (L421)` And in code there: `8bc8860694/whisper/transcribe.py (L243-L251)` * Fix if "logprob_threshold=None" --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2024-11-30 21:47:01 -08:00
Lowell Vaughn	fc5ded7d90	Updating README and doc strings to reflect that n_mels can now be 128 (#2049 )	2024-11-26 09:37:01 -08:00
f1sh	173ff7dd1d	fix typo data/README.md (#2433 )	2024-11-12 16:35:54 -08:00
BotMaster3000	271445b2f2	Update README.md (#2379 ) Default now uses Turbo instead of Small	2024-11-03 23:00:30 -08:00
kittsil	5979f03701	Add option to carry initial_prompt with the sliding window (#2343 ) * Add option to carry initial_prompt with the sliding window Add an option `carry_initial_prompt = False` to `whisper.transcribe()`. When set to `True`, `initial_prompt` is prepended to each internal `decode()` call's `prompt`. If there is not enough context space at the start of the prompt, the prompt is left-sliced to make space. * Prevent redundant initial_prompt_tokens * Revert unnecessary .gitignore change --------- Co-authored-by: Kittsil <kittsil@gmail.com> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2024-10-26 07:17:31 -07:00
Jong Wook Kim	cdb8147962	more pytorch versions in tests (#2408 )	2024-10-25 17:30:02 -07:00
Jong Wook Kim	25639fc17d	Release 20240930 v20240930	2024-09-30 11:20:53 -07:00
Jong Wook Kim	260bbcfcb3	allowing numpy 2 in tests (#2362 ) * allowing numpy 2 in tests * allowing numpy 2 in tests	2024-09-30 11:18:17 -07:00
Jong Wook Kim	25e5c364e0	large-v3-turbo model (#2361 )	2024-09-30 10:59:51 -07:00
Jong Wook Kim	b66b46f32d	test on python/pytorch versions up to 3.12 and 2.4.1 (#2360 )	2024-09-30 10:33:56 -07:00
Jong Wook Kim	27f971320a	using sdpa if available (#2359 ) * using sdpa if available * Update model.py	2024-09-30 10:27:14 -07:00
Jong Wook Kim	423492dda7	Release 20240927 v20240927	2024-09-27 16:43:58 -07:00
Jong Wook Kim	279133e310	pinning numpy<2 in tests (#2332 ) * pinning numpy<2 in tests * pip install together * pip install together	2024-09-10 10:43:21 -07:00
Jianan Xing	32d55d5d76	Relax triton requirements for compatibility with pytorch 2.4 and newer (#2307 ) * Relax triton requirements for compatibility with pytorch 2.4 and newer Similar to https://github.com/openai/whisper/pull/1802, but now when pytorch upgrades to 2.4, it requires triton==3.0.0. I am not sure if it makes sense to remove the upper bound version constraints * Update requirements.txt	2024-09-10 09:53:08 -07:00
ryanheise	ba3f3cd54b	Skip silence around hallucinations (#1838 ) * Add clip_timestamps option * Add hallucination_silence_threshold option * Fix typing for python < 3.9 --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-12-18 12:11:16 -08:00
Bob Lin	8bc8860694	Fix triton env marker (#1887 )	2023-12-11 10:39:08 -05:00
Jong Wook Kim	e58f288045	Release 20231117 v20231117	2023-11-17 11:59:28 -08:00
OliverCai0	ee36decb1a	typing addition	2023-11-14 16:00:45 -06:00
Eugene Indenbom	1cea435768	Relax triton requirements for compatibility with pytorch 2.1 and newer (#1802 )	2023-11-13 09:43:42 -08:00
Jong Wook Kim	fcfeaf1b61	Release 20231106 v20231106	2023-11-06 10:14:04 -08:00
Jong Wook Kim	c5d4256076	large-v3 (#1761 ) * mel_filters() loads 128 mel bins * can load 100-language models * large-v3 checkpoint and evals * add mandarin alias * remove unused path * flake8 fix * formatting fix	2023-11-06 10:10:30 -08:00
Jong Wook Kim	f6f01c561c	Release 20231105 v20231105	2023-11-06 03:08:56 -08:00
Jong Wook Kim	746aaaeafa	remove tiktoken pin (#1759 )	2023-11-06 03:05:21 -08:00
Philippe Hebert	b9f17e1f2d	docs: Disambiguation of the term "relative speed" in the README (#1751 ) * docs: defines relative speed in README * combined paragraphs --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-11-06 02:43:07 -08:00
Mohamad Zamini	7dfcd56304	allow_pickle=False while loading of mel matrix IN audio.py (#1511 ) * Update audio.py The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data. To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time. * Update audio.py updated the docstring * allow_pickle=False * newline --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-11-06 02:28:51 -08:00
Marco Zucconelli	b7d277acd5	handling transcribe exceptions. (#1682 ) * handling transcribe() exceptions. * printing stacktrace --------- Co-authored-by: invalid <invalid@email.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-11-06 02:06:19 -08:00
amosal	6ed314fe41	Add new option to generate subtitles by a specific number of words (#1729 ) * ADD parser for new argument --max_words_count * ADD max_words_count in words_options ADD warning for max_line_width compatibility * ADD logic for max_words_count * rename to max_words_per_line * make them kwargs * allow specifying file path by --model * black formatting --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-11-06 01:49:33 -08:00
Jordi Mas	b38a1f20f4	Fix exception when an audio file with no speech is provided (#1396 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-10-10 10:01:01 -07:00
Jong Wook Kim	0a60fcaa9b	Release 20230918 v20230918	2023-09-18 17:13:19 -07:00
Jong Wook Kim	5f957da5ca	Update test.yml	2023-09-18 16:38:17 -07:00
Arthur Kim	8b330df096	Add .pre-commit-config.yaml (#1528 ) * Add .pre-commit-config.yaml Co-authored-by: arthur <arthur@rtzr.ai> * flake8 E741 --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-09-18 16:15:33 -07:00
sqhao	21010ef454	fix doc of TextDecoder (#1526 ) Signed-off-by: haoshengqiang <haoshengqiang@xiaohongshu.com> Co-authored-by: haoshengqiang <haoshengqiang@xiaohongshu.com>	2023-09-18 16:09:59 -07:00
Nino Risteski	29b7df6231	Update model-card.md (#1643 ) fixed a few typos	2023-09-18 15:59:49 -07:00
taylorchu	e8622f9afc	word timing tweaks (#1559 ) * word timing tweaks * comment on eot * clearer comments	2023-08-08 06:48:56 +09:00
WangChou Lu	b91c907694	Avoid rearranging all caches (#1483 ) * avoid rearranging all kv_caches * avoid calculating the same kv_cache from cross attn * Update decoding.py * linter fix --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-07-06 12:48:08 -07:00
ryanheise	f572f2161b	Improve timestamp heuristics. (#1461 ) * Improve timestamp heuristics. * Track pauses with last_speech_timestamp	2023-06-29 16:51:24 -07:00
Valentin Berkes	248b6cb124	fix condition_on_previous_text (#1224 ) prompt_reset_since is set before all_tokens is extended hence does not have the expected effect.	2023-05-05 00:31:35 -07:00
Paul Willot	7ca9fbea86	Fix numba depreceation notice (#1233 ) From numba 0.57 raise a warning if `nopython` is not supplied: https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit	2023-05-04 23:48:06 -07:00
Brett Balquist	b1c0815c79	Updated README.md to provide more insight on BLEU and specific appendices (#1236 ) * Updated README.md to provide more insight on BLEU and specific appendices in the research paper * Update README.md --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 23:47:45 -07:00
Théo BOYER	e334ff141d	Avoid computing higher temperatures on no_speech segments (#1279 ) * Avoid computing higher temperatures on no_speech In decode_with_fallback, we compute higher temperatures in the case where compression_ratio is too high or avg_logprob is too low. But as the computation of no_speech_prob doens't depend on sampling, we can avoid computing higher temperatures if we detect in the first one that the no_speech condition is fulfilled * Update transcribe.py --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 17:02:36 -07:00
petterreinholdtsen	5523722842	Dropped unused execute bit from mel_filters.npz. (#1254 )	2023-05-04 10:58:56 -07:00
petterreinholdtsen	8035e9ef48	Drop ffmpeg-python dependency and call ffmpeg directly. (#1242 ) * Drop ffmpeg-python dependency and call ffmpeg directly. The last ffmpeg-python module release was in 2019[1], upstream seem to be unavailable[2] and the project development seem to have stagnated[3]. As the features it provide is trivial to replace using the Python native subprocess module, drop the dependency. [1] <URL: https://github.com/kkroening/ffmpeg-python/tags > [2] <URL: https://github.com/kkroening/ffmpeg-python/issues/760 > [3] <URL: https://openhub.net/p/ffmpeg-python > * Rewrote to use subprocess.run() instead of subprocess.Popen(). * formatting changes * formatting update * isort fix * Error checking * isort 🤦🏻 * flake8 fix * minor spelling changes --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 10:53:59 -07:00
Johnny	e69930cb9c	Python 3.11 (#1171 ) * python 3.11 * python 3.11 * fix * fix * fix * revert changes * Update requirements.txt * Trying pip3 install instead * Excluding cp39 - torch 1.10.2 * Removing 1.10.2 from test --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 10:42:09 -07:00
Jong Wook Kim	c09a7ae299	Update decoding.py (#1219 )	2023-04-11 15:13:13 -07:00
Fernando O. Gallego	b0022b3283	Update decoding.py (#1155 ) * Update decoding.py Following the suggestions of @Jeronymous in https://github.com/openai/whisper/pull/914 and https://github.com/openai/whisper/discussions/924, it solves the problem of endless loop. * Removed blank line and whitespaces in empty lines. * Suggested changes according to the linter --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-11 15:06:03 -07:00

1 2 3 4

156 Commits