* Relax triton requirements for compatibility with pytorch 2.4 and newer
Similar to https://github.com/openai/whisper/pull/1802, but now when pytorch upgrades to 2.4, it requires triton==3.0.0. I am not sure if it makes sense to remove the upper bound version constraints
* Update requirements.txt
* Update audio.py
The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data.
To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time.
* Update audio.py
updated the docstring
* allow_pickle=False
* newline
---------
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* ADD parser for new argument --max_words_count
* ADD max_words_count in words_options
ADD warning for max_line_width compatibility
* ADD logic for max_words_count
* rename to max_words_per_line
* make them kwargs
* allow specifying file path by --model
* black formatting
---------
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* avoid rearranging all kv_caches
* avoid calculating the same kv_cache from cross attn
* Update decoding.py
* linter fix
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* Updated README.md to provide more insight on BLEU and specific appendices in the research paper
* Update README.md
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* Avoid computing higher temperatures on no_speech
In decode_with_fallback, we compute higher temperatures in the case where compression_ratio is too high or avg_logprob is too low.
But as the computation of no_speech_prob doens't depend on sampling, we can avoid computing higher temperatures if we detect in the first one that the no_speech condition is fulfilled
* Update transcribe.py
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* Drop ffmpeg-python dependency and call ffmpeg directly.
The last ffmpeg-python module release was in 2019[1], upstream seem to be
unavailable[2] and the project development seem to have stagnated[3]. As
the features it provide is trivial to replace using the Python native
subprocess module, drop the dependency.
[1] <URL: https://github.com/kkroening/ffmpeg-python/tags >
[2] <URL: https://github.com/kkroening/ffmpeg-python/issues/760 >
[3] <URL: https://openhub.net/p/ffmpeg-python >
* Rewrote to use subprocess.run() instead of subprocess.Popen().
* formatting changes
* formatting update
* isort fix
* Error checking
* isort 🤦🏻
* flake8 fix
* minor spelling changes
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* Squash long words at window and sentence boundaries.
* Formatting requirements.
* Fix squashing logic to point to correct words.
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* attempt to fix the repetition/hallucination issue identified in #1046
* zero-pad the audio instead of spectrogram
* formatting fix
* delete debug print