whisper

mirror of https://github.com/openai/whisper.git synced 2025-11-27 15:54:00 +00:00

Author	SHA1	Message	Date
Jong Wook Kim	c5d4256076	large-v3 (#1761 ) * mel_filters() loads 128 mel bins * can load 100-language models * large-v3 checkpoint and evals * add mandarin alias * remove unused path * flake8 fix * formatting fix	2023-11-06 10:10:30 -08:00
Mohamad Zamini	7dfcd56304	allow_pickle=False while loading of mel matrix IN audio.py (#1511 ) * Update audio.py The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data. To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time. * Update audio.py updated the docstring * allow_pickle=False * newline --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-11-06 02:28:51 -08:00
petterreinholdtsen	8035e9ef48	Drop ffmpeg-python dependency and call ffmpeg directly. (#1242 ) * Drop ffmpeg-python dependency and call ffmpeg directly. The last ffmpeg-python module release was in 2019[1], upstream seem to be unavailable[2] and the project development seem to have stagnated[3]. As the features it provide is trivial to replace using the Python native subprocess module, drop the dependency. [1] <URL: https://github.com/kkroening/ffmpeg-python/tags > [2] <URL: https://github.com/kkroening/ffmpeg-python/issues/760 > [3] <URL: https://openhub.net/p/ffmpeg-python > * Rewrote to use subprocess.run() instead of subprocess.Popen(). * formatting changes * formatting update * isort fix * Error checking * isort 🤦🏻 * flake8 fix * minor spelling changes --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 10:53:59 -07:00
Jong Wook Kim	919a713499	attempt to fix the repetition/hallucination issue identified in #1046 (#1052 ) * attempt to fix the repetition/hallucination issue identified in #1046 * zero-pad the audio instead of spectrogram * formatting fix * delete debug print	2023-03-07 20:08:45 -08:00
Jong Wook Kim	b80bcf610d	apply formatting with `black` (#1038 ) * applying black (with the default 88-column limit) * add flake8 * add isort * fix isort	2023-03-06 15:50:37 -08:00
Jong Wook Kim	500d0fe966	word-level timestamps in `transcribe()` (#869 ) * word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <ryan@ryanheise.com>	2023-03-06 14:00:49 -08:00
Markus Hennerbichler	6df3ea1fb5	Support batch-dimension in log_mel_spectogram (#839 )	2023-01-16 23:46:15 -08:00
Michael Monashev	f680570016	Fix bug (#305 ) Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)	2022-10-17 11:38:20 -07:00
sawadata	deafef05f3	Update audio.py (#178 ) add '-nostdin' argument	2022-09-29 12:34:04 -07:00
Ram Rachum	59f543e218	Fix exception cause in audio.py (#33 )	2022-09-23 12:12:37 +09:00
Jong Wook Kim	6e3be77e1a	initial commit	2022-09-22 01:09:43 +09:00

11 Commits