* Update audio.py
The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data.
To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time.
* Update audio.py
updated the docstring
* allow_pickle=False
* newline
---------
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* Drop ffmpeg-python dependency and call ffmpeg directly.
The last ffmpeg-python module release was in 2019[1], upstream seem to be
unavailable[2] and the project development seem to have stagnated[3]. As
the features it provide is trivial to replace using the Python native
subprocess module, drop the dependency.
[1] <URL: https://github.com/kkroening/ffmpeg-python/tags >
[2] <URL: https://github.com/kkroening/ffmpeg-python/issues/760 >
[3] <URL: https://openhub.net/p/ffmpeg-python >
* Rewrote to use subprocess.run() instead of subprocess.Popen().
* formatting changes
* formatting update
* isort fix
* Error checking
* isort 🤦🏻
* flake8 fix
* minor spelling changes
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* attempt to fix the repetition/hallucination issue identified in #1046
* zero-pad the audio instead of spectrogram
* formatting fix
* delete debug print
* word-level timestamps in `transcribe()`
* moving to `timing.py`
* numba implementation for dtw, replacing dtw-python
* triton implementation for dtw
* add test for dtw implementations
* triton implementation of median_filter
* a simple word-level timestamps test
* add scipy as dev dependency
* installs an older version of Triton if CUDA < 11.4
* fix broken merge
* loosen nvcc version match regex
* find_alignment() function
* miscellaneous improvements
* skip median filtering when the input is too small
* Expose punctuation options in cli and transcribe() (#973)
* fix merge error
* fix merge error 2
* annotating that word_timestamps is experimental
---------
Co-authored-by: ryanheise <ryan@ryanheise.com>
Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)