139 Commits

Author SHA1 Message Date
Jong Wook Kim
423492dda7 Release 20240927 v20240927 2024-09-27 16:43:58 -07:00
Jong Wook Kim
279133e310
pinning numpy<2 in tests (#2332)
* pinning numpy<2 in tests

* pip install together

* pip install together
2024-09-10 10:43:21 -07:00
Jianan Xing
32d55d5d76
Relax triton requirements for compatibility with pytorch 2.4 and newer (#2307)
* Relax triton requirements for compatibility with pytorch 2.4 and newer

Similar to https://github.com/openai/whisper/pull/1802, but now when pytorch upgrades to 2.4, it requires triton==3.0.0. I am not sure if it makes sense to remove the upper bound version constraints

* Update requirements.txt
2024-09-10 09:53:08 -07:00
ryanheise
ba3f3cd54b
Skip silence around hallucinations (#1838)
* Add clip_timestamps option

* Add hallucination_silence_threshold option

* Fix typing for python < 3.9

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-12-18 12:11:16 -08:00
Bob Lin
8bc8860694
Fix triton env marker (#1887) 2023-12-11 10:39:08 -05:00
Jong Wook Kim
e58f288045 Release 20231117 v20231117 2023-11-17 11:59:28 -08:00
Eugene Indenbom
1cea435768
Relax triton requirements for compatibility with pytorch 2.1 and newer (#1802) 2023-11-13 09:43:42 -08:00
Jong Wook Kim
fcfeaf1b61 Release 20231106 v20231106 2023-11-06 10:14:04 -08:00
Jong Wook Kim
c5d4256076
large-v3 (#1761)
* mel_filters() loads 128 mel bins

* can load 100-language models

* large-v3 checkpoint and evals

* add mandarin alias

* remove unused path

* flake8 fix

* formatting fix
2023-11-06 10:10:30 -08:00
Jong Wook Kim
f6f01c561c Release 20231105 v20231105 2023-11-06 03:08:56 -08:00
Jong Wook Kim
746aaaeafa
remove tiktoken pin (#1759) 2023-11-06 03:05:21 -08:00
Philippe Hebert
b9f17e1f2d
docs: Disambiguation of the term "relative speed" in the README (#1751)
* docs: defines relative speed in README

* combined paragraphs

---------

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-11-06 02:43:07 -08:00
Mohamad Zamini
7dfcd56304
allow_pickle=False while loading of mel matrix IN audio.py (#1511)
* Update audio.py

 The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data.

To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time.

* Update audio.py

updated the docstring

* allow_pickle=False

* newline

---------

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-11-06 02:28:51 -08:00
Marco Zucconelli
b7d277acd5
handling transcribe exceptions. (#1682)
* handling transcribe() exceptions.

* printing stacktrace

---------

Co-authored-by: invalid <invalid@email.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-11-06 02:06:19 -08:00
amosal
6ed314fe41
Add new option to generate subtitles by a specific number of words (#1729)
* ADD parser for new argument --max_words_count

* ADD max_words_count in words_options
ADD warning for max_line_width compatibility

* ADD logic for max_words_count

* rename to max_words_per_line

* make them kwargs

* allow specifying file path by --model

* black formatting

---------

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-11-06 01:49:33 -08:00
Jordi Mas
b38a1f20f4
Fix exception when an audio file with no speech is provided (#1396)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-10-10 10:01:01 -07:00
Jong Wook Kim
0a60fcaa9b Release 20230918 v20230918 2023-09-18 17:13:19 -07:00
Jong Wook Kim
5f957da5ca
Update test.yml 2023-09-18 16:38:17 -07:00
Arthur Kim
8b330df096
Add .pre-commit-config.yaml (#1528)
* Add .pre-commit-config.yaml

Co-authored-by: arthur <arthur@rtzr.ai>

* flake8 E741

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-09-18 16:15:33 -07:00
sqhao
21010ef454
fix doc of TextDecoder (#1526)
Signed-off-by: haoshengqiang <haoshengqiang@xiaohongshu.com>
Co-authored-by: haoshengqiang <haoshengqiang@xiaohongshu.com>
2023-09-18 16:09:59 -07:00
Nino Risteski
29b7df6231
Update model-card.md (#1643)
fixed a few typos
2023-09-18 15:59:49 -07:00
taylorchu
e8622f9afc
word timing tweaks (#1559)
* word timing tweaks

* comment on eot

* clearer comments
2023-08-08 06:48:56 +09:00
WangChou Lu
b91c907694
Avoid rearranging all caches (#1483)
* avoid rearranging all kv_caches

* avoid calculating the same kv_cache from cross attn

* Update decoding.py

* linter fix

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-07-06 12:48:08 -07:00
ryanheise
f572f2161b
Improve timestamp heuristics. (#1461)
* Improve timestamp heuristics.

* Track pauses with last_speech_timestamp
2023-06-29 16:51:24 -07:00
Valentin Berkes
248b6cb124
fix condition_on_previous_text (#1224)
prompt_reset_since is set before all_tokens is extended hence does not have the expected effect.
2023-05-05 00:31:35 -07:00
Paul Willot
7ca9fbea86
Fix numba depreceation notice (#1233)
From numba 0.57 raise a warning if `nopython` is not supplied:
https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
2023-05-04 23:48:06 -07:00
Brett Balquist
b1c0815c79
Updated README.md to provide more insight on BLEU and specific appendices (#1236)
* Updated README.md to provide more insight on BLEU and specific appendices in the research paper

* Update README.md

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-05-04 23:47:45 -07:00
Théo BOYER
e334ff141d
Avoid computing higher temperatures on no_speech segments (#1279)
* Avoid computing higher temperatures on no_speech

In decode_with_fallback, we compute higher temperatures in the case where compression_ratio is too high or avg_logprob is too low.
But as the computation of no_speech_prob doens't depend on sampling, we can avoid computing higher temperatures if we detect in the first one that the no_speech condition is fulfilled

* Update transcribe.py

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-05-04 17:02:36 -07:00
petterreinholdtsen
5523722842
Dropped unused execute bit from mel_filters.npz. (#1254) 2023-05-04 10:58:56 -07:00
petterreinholdtsen
8035e9ef48
Drop ffmpeg-python dependency and call ffmpeg directly. (#1242)
* Drop ffmpeg-python dependency and call ffmpeg directly.

The last ffmpeg-python module release was in 2019[1], upstream seem to be
unavailable[2] and the project development seem to have stagnated[3].  As
the features it provide is trivial to replace using the Python native
subprocess module, drop the dependency.

 [1] <URL: https://github.com/kkroening/ffmpeg-python/tags >
 [2] <URL: https://github.com/kkroening/ffmpeg-python/issues/760 >
 [3] <URL: https://openhub.net/p/ffmpeg-python >

* Rewrote to use subprocess.run() instead of subprocess.Popen().

* formatting changes

* formatting update

* isort fix

* Error checking

* isort 🤦🏻

* flake8 fix

* minor spelling changes

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-05-04 10:53:59 -07:00
Johnny
e69930cb9c
Python 3.11 (#1171)
* python 3.11

* python 3.11

* fix

* fix

* fix

* revert changes

* Update requirements.txt

* Trying pip3 install instead

* Excluding cp39 - torch 1.10.2

* Removing 1.10.2 from test

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-05-04 10:42:09 -07:00
Jong Wook Kim
c09a7ae299
Update decoding.py (#1219) 2023-04-11 15:13:13 -07:00
Fernando O. Gallego
b0022b3283
Update decoding.py (#1155)
* Update decoding.py

Following the suggestions of @Jeronymous in https://github.com/openai/whisper/pull/914 and https://github.com/openai/whisper/discussions/924, it solves the problem of endless loop.

* Removed blank line and whitespaces in empty lines.

* Suggested changes according to the linter

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-04-11 15:06:03 -07:00
Arseniy Bushyn
76c901ab8d
Update README.md to reference tiktoken (#1105)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-04-10 17:39:17 -07:00
ryanheise
43940fc978
Implement max line width and max line count, and make word highlighting optional (#1184)
* Add highlight_words, max_line_width, max_line_count

* Refactor subtitle generator

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-04-10 17:28:35 -07:00
ryanheise
255887f219
Squash long words at window and sentence boundaries. (#1114)
* Squash long words at window and sentence boundaries.

* Formatting requirements.

* Fix squashing logic to point to correct words.

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-04-10 17:23:53 -07:00
K.B.Dharun Krishna
a151816b6b
python-publish.yml: bump actions version to fix node warning (#1211) 2023-04-10 13:54:09 -07:00
Jong Wook Kim
b5851c6c40
Update tokenizer.py (#1163) 2023-03-29 13:12:36 -07:00
Jong Wook Kim
6dea21fd7f Release 20230314 v20230314 2023-03-15 00:39:19 -07:00
Jong Wook Kim
79c43e4859
abort find_alignment on empty input (#1090) 2023-03-14 12:47:58 -07:00
Guillaume Klein
5f9ac653b7
Fix truncated words list when the replacement character is decoded (#1089) 2023-03-14 09:32:41 -07:00
Akash Mahajan
ba88b8e1b3
fix github language stats getting dominated by jupyter notebook (#1076)
Co-authored-by: Akash Mahajan <akash.mahajan@microsoft.com>
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-03-14 00:07:09 -07:00
Guillaume Klein
671ac5a4ce
Fix alignment between the segments and the list of words (#1087)
* Fix alignment between the segments and the list of words

* Ensure the word index does not overflow
2023-03-13 16:34:09 -07:00
Jong Wook Kim
839639a223
Use tiktoken (#1044)
* use tiktoken==0.3.0

* formatting

* tuple should be safer

* Update whisper/tokenizer.py

Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com>

* use tiktoken 0.3.1

* reflecting suggestions

* cleanup

* bypassing load_tiktoken_bpe to avoid blobfile dep

---------

Co-authored-by: Ruhollah Majdoddin <r.majdodin@gmail.com>
2023-03-13 02:34:16 -07:00
Jong Wook Kim
ad3250a846 Release 20230308 v20230308 2023-03-08 15:48:57 -08:00
Jong Wook Kim
c4b50c0824
kwargs in decode() for convenience (#1061)
* kwargs in decode() for convenience

* formatting fix
2023-03-08 15:46:38 -08:00
Jong Wook Kim
38f2f4d99d
fix all_tokens handling that caused more repetitions and discrepancy in JSON (#1060) 2023-03-08 15:34:07 -08:00
Jong Wook Kim
aac47c9834 fix typo 2023-03-07 20:43:49 -08:00
Jong Wook Kim
26807ec6d3 Release 20230307 v20230307 2023-03-07 20:36:29 -08:00
Jong Wook Kim
919a713499
attempt to fix the repetition/hallucination issue identified in #1046 (#1052)
* attempt to fix the repetition/hallucination issue identified in #1046

* zero-pad the audio instead of spectrogram

* formatting fix

* delete debug print
2023-03-07 20:08:45 -08:00