whisper

mirror of https://github.com/openai/whisper.git synced 2025-11-23 22:15:58 +00:00

Author	SHA1	Message	Date
Jong Wook Kim	f6f01c561c	Release 20231105 v20231105	2023-11-06 03:08:56 -08:00
Jong Wook Kim	746aaaeafa	remove tiktoken pin (#1759 )	2023-11-06 03:05:21 -08:00
Philippe Hebert	b9f17e1f2d	docs: Disambiguation of the term "relative speed" in the README (#1751 ) * docs: defines relative speed in README * combined paragraphs --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-11-06 02:43:07 -08:00
Mohamad Zamini	7dfcd56304	allow_pickle=False while loading of mel matrix IN audio.py (#1511 ) * Update audio.py The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data. To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time. * Update audio.py updated the docstring * allow_pickle=False * newline --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-11-06 02:28:51 -08:00
Marco Zucconelli	b7d277acd5	handling transcribe exceptions. (#1682 ) * handling transcribe() exceptions. * printing stacktrace --------- Co-authored-by: invalid <invalid@email.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-11-06 02:06:19 -08:00
amosal	6ed314fe41	Add new option to generate subtitles by a specific number of words (#1729 ) * ADD parser for new argument --max_words_count * ADD max_words_count in words_options ADD warning for max_line_width compatibility * ADD logic for max_words_count * rename to max_words_per_line * make them kwargs * allow specifying file path by --model * black formatting --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-11-06 01:49:33 -08:00
Jordi Mas	b38a1f20f4	Fix exception when an audio file with no speech is provided (#1396 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-10-10 10:01:01 -07:00
Jong Wook Kim	0a60fcaa9b	Release 20230918 v20230918	2023-09-18 17:13:19 -07:00
Jong Wook Kim	5f957da5ca	Update test.yml	2023-09-18 16:38:17 -07:00
Arthur Kim	8b330df096	Add .pre-commit-config.yaml (#1528 ) * Add .pre-commit-config.yaml Co-authored-by: arthur <arthur@rtzr.ai> * flake8 E741 --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-09-18 16:15:33 -07:00
sqhao	21010ef454	fix doc of TextDecoder (#1526 ) Signed-off-by: haoshengqiang <haoshengqiang@xiaohongshu.com> Co-authored-by: haoshengqiang <haoshengqiang@xiaohongshu.com>	2023-09-18 16:09:59 -07:00
Nino Risteski	29b7df6231	Update model-card.md (#1643 ) fixed a few typos	2023-09-18 15:59:49 -07:00
taylorchu	e8622f9afc	word timing tweaks (#1559 ) * word timing tweaks * comment on eot * clearer comments	2023-08-08 06:48:56 +09:00
WangChou Lu	b91c907694	Avoid rearranging all caches (#1483 ) * avoid rearranging all kv_caches * avoid calculating the same kv_cache from cross attn * Update decoding.py * linter fix --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-07-06 12:48:08 -07:00
ryanheise	f572f2161b	Improve timestamp heuristics. (#1461 ) * Improve timestamp heuristics. * Track pauses with last_speech_timestamp	2023-06-29 16:51:24 -07:00
Valentin Berkes	248b6cb124	fix condition_on_previous_text (#1224 ) prompt_reset_since is set before all_tokens is extended hence does not have the expected effect.	2023-05-05 00:31:35 -07:00
Paul Willot	7ca9fbea86	Fix numba depreceation notice (#1233 ) From numba 0.57 raise a warning if `nopython` is not supplied: https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit	2023-05-04 23:48:06 -07:00
Brett Balquist	b1c0815c79	Updated README.md to provide more insight on BLEU and specific appendices (#1236 ) * Updated README.md to provide more insight on BLEU and specific appendices in the research paper * Update README.md --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 23:47:45 -07:00
Théo BOYER	e334ff141d	Avoid computing higher temperatures on no_speech segments (#1279 ) * Avoid computing higher temperatures on no_speech In decode_with_fallback, we compute higher temperatures in the case where compression_ratio is too high or avg_logprob is too low. But as the computation of no_speech_prob doens't depend on sampling, we can avoid computing higher temperatures if we detect in the first one that the no_speech condition is fulfilled * Update transcribe.py --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 17:02:36 -07:00
petterreinholdtsen	5523722842	Dropped unused execute bit from mel_filters.npz. (#1254 )	2023-05-04 10:58:56 -07:00
petterreinholdtsen	8035e9ef48	Drop ffmpeg-python dependency and call ffmpeg directly. (#1242 ) * Drop ffmpeg-python dependency and call ffmpeg directly. The last ffmpeg-python module release was in 2019[1], upstream seem to be unavailable[2] and the project development seem to have stagnated[3]. As the features it provide is trivial to replace using the Python native subprocess module, drop the dependency. [1] <URL: https://github.com/kkroening/ffmpeg-python/tags > [2] <URL: https://github.com/kkroening/ffmpeg-python/issues/760 > [3] <URL: https://openhub.net/p/ffmpeg-python > * Rewrote to use subprocess.run() instead of subprocess.Popen(). * formatting changes * formatting update * isort fix * Error checking * isort 🤦🏻 * flake8 fix * minor spelling changes --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 10:53:59 -07:00
Johnny	e69930cb9c	Python 3.11 (#1171 ) * python 3.11 * python 3.11 * fix * fix * fix * revert changes * Update requirements.txt * Trying pip3 install instead * Excluding cp39 - torch 1.10.2 * Removing 1.10.2 from test --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 10:42:09 -07:00
Jong Wook Kim	c09a7ae299	Update decoding.py (#1219 )	2023-04-11 15:13:13 -07:00
Fernando O. Gallego	b0022b3283	Update decoding.py (#1155 ) * Update decoding.py Following the suggestions of @Jeronymous in https://github.com/openai/whisper/pull/914 and https://github.com/openai/whisper/discussions/924, it solves the problem of endless loop. * Removed blank line and whitespaces in empty lines. * Suggested changes according to the linter --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-11 15:06:03 -07:00
Arseniy Bushyn	76c901ab8d	Update README.md to reference tiktoken (#1105 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-10 17:39:17 -07:00
ryanheise	43940fc978	Implement max line width and max line count, and make word highlighting optional (#1184 ) * Add highlight_words, max_line_width, max_line_count * Refactor subtitle generator --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-10 17:28:35 -07:00
ryanheise	255887f219	Squash long words at window and sentence boundaries. (#1114 ) * Squash long words at window and sentence boundaries. * Formatting requirements. * Fix squashing logic to point to correct words. --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-10 17:23:53 -07:00
K.B.Dharun Krishna	a151816b6b	python-publish.yml: bump actions version to fix node warning (#1211 )	2023-04-10 13:54:09 -07:00
Jong Wook Kim	b5851c6c40	Update tokenizer.py (#1163 )	2023-03-29 13:12:36 -07:00
Jong Wook Kim	6dea21fd7f	Release 20230314 v20230314	2023-03-15 00:39:19 -07:00
Jong Wook Kim	79c43e4859	abort find_alignment on empty input (#1090 )	2023-03-14 12:47:58 -07:00
Guillaume Klein	5f9ac653b7	Fix truncated words list when the replacement character is decoded (#1089 )	2023-03-14 09:32:41 -07:00
Akash Mahajan	ba88b8e1b3	fix github language stats getting dominated by jupyter notebook (#1076 ) Co-authored-by: Akash Mahajan <akash.mahajan@microsoft.com> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-14 00:07:09 -07:00
Guillaume Klein	671ac5a4ce	Fix alignment between the segments and the list of words (#1087 ) * Fix alignment between the segments and the list of words * Ensure the word index does not overflow	2023-03-13 16:34:09 -07:00
Jong Wook Kim	839639a223	Use tiktoken (#1044 ) * use tiktoken==0.3.0 * formatting * tuple should be safer * Update whisper/tokenizer.py Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com> * use tiktoken 0.3.1 * reflecting suggestions * cleanup * bypassing load_tiktoken_bpe to avoid blobfile dep --------- Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com>	2023-03-13 02:34:16 -07:00
Jong Wook Kim	ad3250a846	Release 20230308 v20230308	2023-03-08 15:48:57 -08:00
Jong Wook Kim	c4b50c0824	kwargs in decode() for convenience (#1061 ) * kwargs in decode() for convenience * formatting fix	2023-03-08 15:46:38 -08:00
Jong Wook Kim	38f2f4d99d	fix all_tokens handling that caused more repetitions and discrepancy in JSON (#1060 )	2023-03-08 15:34:07 -08:00
Jong Wook Kim	aac47c9834	fix typo	2023-03-07 20:43:49 -08:00
Jong Wook Kim	26807ec6d3	Release 20230307 v20230307	2023-03-07 20:36:29 -08:00
Jong Wook Kim	919a713499	attempt to fix the repetition/hallucination issue identified in #1046 (#1052 ) * attempt to fix the repetition/hallucination issue identified in #1046 * zero-pad the audio instead of spectrogram * formatting fix * delete debug print	2023-03-07 20:08:45 -08:00
Jong Wook Kim	38e990d853	Use triton==2.0.0 (#1053 )	2023-03-07 16:56:31 -08:00
Jong Wook Kim	924e1f8e06	Try installing triton only if linux & x86_64 (#1051 )	2023-03-07 11:31:40 -08:00
Jong Wook Kim	4b0d5e58d0	Update setup.py	2023-03-07 04:47:46 -08:00
Jong Wook Kim	8180fde939	Release 20230306 v20230306	2023-03-06 18:53:04 -08:00
Local State	c6e4e5efb3	remove auxiliary audio extension (#1021 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-06 17:48:14 -08:00
Jong Wook Kim	b80bcf610d	apply formatting with `black` (#1038 ) * applying black (with the default 88-column limit) * add flake8 * add isort * fix isort	2023-03-06 15:50:37 -08:00
Jong Wook Kim	500d0fe966	word-level timestamps in `transcribe()` (#869 ) * word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <ryan@ryanheise.com>	2023-03-06 14:00:49 -08:00
Jong Wook Kim	eab8d920ed	Decoding improvements (#1033 ) * suppress task tokens (transcribe/translate) * not ignoring the last segment ending with one timestamp	2023-03-06 11:32:32 -08:00
Roman Vasilenko	3e1780fd37	Update README.md (#894 ) Fixed a few typos and made general improvements for clarity. Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-03 16:41:59 -08:00

1 2 3

130 Commits