* avoid rearranging all kv_caches
* avoid calculating the same kv_cache from cross attn
* Update decoding.py
* linter fix
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* Updated README.md to provide more insight on BLEU and specific appendices in the research paper
* Update README.md
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* Avoid computing higher temperatures on no_speech
In decode_with_fallback, we compute higher temperatures in the case where compression_ratio is too high or avg_logprob is too low.
But as the computation of no_speech_prob doens't depend on sampling, we can avoid computing higher temperatures if we detect in the first one that the no_speech condition is fulfilled
* Update transcribe.py
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* Drop ffmpeg-python dependency and call ffmpeg directly.
The last ffmpeg-python module release was in 2019[1], upstream seem to be
unavailable[2] and the project development seem to have stagnated[3]. As
the features it provide is trivial to replace using the Python native
subprocess module, drop the dependency.
[1] <URL: https://github.com/kkroening/ffmpeg-python/tags >
[2] <URL: https://github.com/kkroening/ffmpeg-python/issues/760 >
[3] <URL: https://openhub.net/p/ffmpeg-python >
* Rewrote to use subprocess.run() instead of subprocess.Popen().
* formatting changes
* formatting update
* isort fix
* Error checking
* isort 🤦🏻
* flake8 fix
* minor spelling changes
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* Squash long words at window and sentence boundaries.
* Formatting requirements.
* Fix squashing logic to point to correct words.
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* attempt to fix the repetition/hallucination issue identified in #1046
* zero-pad the audio instead of spectrogram
* formatting fix
* delete debug print
* word-level timestamps in `transcribe()`
* moving to `timing.py`
* numba implementation for dtw, replacing dtw-python
* triton implementation for dtw
* add test for dtw implementations
* triton implementation of median_filter
* a simple word-level timestamps test
* add scipy as dev dependency
* installs an older version of Triton if CUDA < 11.4
* fix broken merge
* loosen nvcc version match regex
* find_alignment() function
* miscellaneous improvements
* skip median filtering when the input is too small
* Expose punctuation options in cli and transcribe() (#973)
* fix merge error
* fix merge error 2
* annotating that word_timestamps is experimental
---------
Co-authored-by: ryanheise <ryan@ryanheise.com>
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>
* for easier reading by spreadsheets importing CSV, the third
column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".
* fix syntax error
* docstring edit
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>