whisper

mirror of https://github.com/openai/whisper.git synced 2025-05-28 09:46:38 +00:00

Author	SHA1	Message	Date
Jong Wook Kim	500d0fe966	word-level timestamps in `transcribe()` (#869 ) * word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <ryan@ryanheise.com>	2023-03-06 14:00:49 -08:00
Jong Wook Kim	eab8d920ed	Decoding improvements (#1033 ) * suppress task tokens (transcribe/translate) * not ignoring the last segment ending with one timestamp	2023-03-06 11:32:32 -08:00
Roman Vasilenko	3e1780fd37	Update README.md (#894 ) Fixed a few typos and made general improvements for clarity. Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-03-03 16:41:59 -08:00
Andrey Chernykh	7858aa9c08	Fix infinite loop caused by incorrect timestamp tokens prediction (#914 ) * Fix infinite loop caused by incorrect timestamp tokens prediction https://github.com/openai/whisper/discussions/810 * Update decoding.py --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-02-01 15:46:51 -08:00
Jong Wook Kim	5c1a8c10e7	clarify that 3.11 is not supported	2023-01-27 00:01:49 -08:00
Jong Wook Kim	4e635c6644	Update README.md about Python 3.8+ requirement	2023-01-24 14:45:56 -08:00
Jong Wook Kim	a6b36ede1f	drop python 3.7 support (#889 )	2023-01-24 14:05:57 -08:00
Jong Wook Kim	55f690af79	Release 20230124 v20230124	2023-01-24 11:11:08 -08:00
Jong Wook Kim	7f1ef223ab	handle printing even if sys.stdout.buffer is not available (#887 )	2023-01-24 10:12:04 -08:00
Niels Mayer	f5bfe004ec	Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228 ) * Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas> * for easier reading by spreadsheets importing CSV, the third column of the CSV file is delimited by quotes, and any quote characters that might be in the transcript (which would interfere with parsing the third column as a string) are converted to "''". * fix syntax error * docstring edit Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-22 00:27:17 -08:00
Aaryan YVS	da600abd2b	Added --output_format option (#333 ) * Added --output option --output option will help select the output files that will be generated. Corrected the logic, which wrongly shows progress bar when verbose is set to False * Changed output_files variable * Changed back the tqdm verbose * refactor output format handling Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-21 23:58:38 -08:00
zer0-x	9f7aba6099	Handle XDG_CACHE_HOME properly for download_root (#864 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-01-21 01:09:39 -08:00
Jong Wook Kim	12e1089462	use stdout for printing transcription progress (#867 )	2023-01-20 00:54:05 -08:00
Markus Hennerbichler	ea1c266709	Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659 ) Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-18 10:41:11 -08:00
Jong Wook Kim	8135a7c31c	verbose outputs from pytest	2023-01-18 10:30:18 -08:00
Jong Wook Kim	9d646db9d8	print '?' if a letter can't be encoded using the system default encoding (#859 )	2023-01-17 23:28:36 -08:00
Jong Wook Kim	37a4f1be6d	Release 20230117 v20230117	2023-01-17 16:08:28 -08:00
Romain Beaumont	b9f9b433ae	Add github action to automatically push to pypi on Release x.y.z commit (#681 ) * Add github action to automatically push to pypi on Release x.y.z commit * some housekeeping for pypi upload * add version.py Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-17 15:50:26 -08:00
Umar Farooqi	f0083e7eb2	Use ndimage.median_filter instead of signal.medfilter (#812 ) For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s. Co-authored-by: Umar Farooqi <umar@paystash.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-17 14:43:05 -08:00
Jong Wook Kim	a84191faae	rename GitHub workflow	2023-01-17 13:54:40 -08:00
Jong Wook Kim	b1d213c0c7	allow test_transcribe to run on CPU when CUDA is not available	2023-01-17 13:43:36 -08:00
Jong Wook Kim	493dfffa37	add github action to run pytest	2023-01-17 13:38:33 -08:00
Mikko Vedru	0f39c89d92	Update README.md (#804 )	2023-01-16 23:46:42 -08:00
Markus Hennerbichler	6df3ea1fb5	Support batch-dimension in log_mel_spectogram (#839 )	2023-01-16 23:46:15 -08:00
adamreis	70861c7ce3	Fix tiny transcribe() docstring typo (#857 ) s/successfully/successively, which I believe was the intent.	2023-01-16 22:42:01 -08:00
Jong Wook Kim	f82bc59f5e	torch.concatenate -> torch.cat for compatibility	2023-01-10 10:53:18 -08:00
Jong Wook Kim	28769fcfe5	word-level timestamps in Multilingual_ASR notebook	2022-12-31 10:03:42 -07:00
Jong Wook Kim	53807677fe	MultiHeadAttention to return qk as well	2022-12-30 01:53:57 -07:00
Jong Wook Kim	9323b2526c	Revert "saving the qk matrix in the attention module for convenience" This reverts commit 68e44bd83ce6c3e352f74b266aa39d8b649af9e3.	2022-12-29 23:53:31 -07:00
Jong Wook Kim	68e44bd83c	saving the qk matrix in the attention module for convenience	2022-12-29 23:02:52 -07:00
Jong Wook Kim	0b5dcfdef7	large-v2 figure and arxiv url update	2022-12-09 00:12:39 -05:00
altryne	b9265e5796	Update Hebrew language code to he per IANA registry (#401 ) * Update Hebrew language code to he per IANA registry Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he` The correct subtag: ``` %% Type: language Subtag: he Description: Hebrew Added: 2005-10-16 Suppress-Script: Hebr %% ``` And the deprecation ``` %% Type: language Subtag: iw Description: Hebrew Added: 2005-10-16 Deprecated: 1989-01-01 Preferred-Value: he Suppress-Script: Hebr %% ``` * Update hebrew ISO code to he Per discussion, it's ok to make this change without backwards compatibility	2022-12-07 13:45:31 -05:00
Paul Harter	fd8f80c8b8	Explicitly closing model file after reading it (#630 )	2022-12-06 12:07:19 -05:00
Jong Wook Kim	4179ed2475	add large-v2 model - The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large. - It has the same architecture as the original large model. - When `load_model("large")` is called, the "large-v2" model will be loaded. - We will soon update the paper regarding this new model.	2022-12-05 11:07:14 -05:00
jumon	ec1b34bb90	fix compression ratio function (#561 )	2022-12-04 17:27:42 -06:00
Jong Wook Kim	eff383b27b	invoking __call__ instead of forward()	2022-11-16 04:18:50 -08:00
Jong Wook Kim	02aa851a49	fix to return only the text token ids	2022-11-15 16:25:11 -08:00
jumon	76148a56c5	suppress generating non-timestamp tokens at the beginning (#532 )	2022-11-15 11:44:36 -08:00
Vicki Anand	9f70a352f9	Fix attention caching to make it actually work (#370 )	2022-10-19 16:44:03 -07:00
Sumana Harihareswara	7f3e408e09	Add package metadata to setup.py (#315 ) Add project summary, license, etc. for display with "pip show" and similar Python package distribution tools.	2022-10-17 13:51:16 -07:00
Michael Monashev	f680570016	Fix bug (#305 ) Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)	2022-10-17 11:38:20 -07:00
Jong Wook Kim	d18e9ea5dd	transcribe() on English-only model won't complain when language="en" is not given	2022-10-09 02:40:12 -07:00
David Marx	82725cea9c	infer download_root from XDG_CACHE_HOME if avail (#257 )	2022-10-09 02:14:03 -07:00
eudoxos	35713c66e0	Add --threads option to transcribe (#278 ) * Add --threads option to transcribe Torch on CPU uses by default number_of_cores/2. This option allows to override this default. * Update transcribe.py Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>	2022-10-09 02:11:15 -07:00
Corentin Jemine	9e653bd0ea	Fixed CoW RuntimeError in DecodingTask.run() (#240 )	2022-10-04 08:49:31 -07:00
Tom Stuart	02b74308ff	Fix timestamps and strip extraneous whitespace in WebVTT output (#219 ) * Use two-digit hours in WebVTT timestamps Per the WebVTT specification [0]: > A WebVTT timestamp consists of the following components, in the given > order: > > 1. Optionally (required if hours is non-zero): > 1. Two or more ASCII digits, representing the hours as a base ten > integer. > 2. A U+003A COLON character (:) YouTube won’t accept timestamps containing single-digit hours. [0] https://www.w3.org/TR/webvtt1/#webvtt-timestamp * Strip segment text in WebVTT output We already do this for plain text and SubRip output, so we should do it for WebVTT too.	2022-10-03 14:51:07 -07:00
Jibin Mathew	0b1ba3d46e	Add model_dir to arguments (#202 ) * Add model_dir to arguments * minor formatting change Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2022-09-30 14:45:51 -07:00
Caleb McQuillin	60132ade70	Use , character instead of . for SRT output. (#197 ) The SRT format uses the decimal comma character as the fractional separator rather than the decimal point character. Adjust format_timestamp and write_srt to specify the separator character. See https://en.wikipedia.org/wiki/SubRip#:~:text=the%20fractional%20separator%20used%20is%20the%20comma%2C%20since%20the%20program%20was%20written%20in%20france.	2022-09-29 20:44:12 -07:00
Jong Wook Kim	7cb4cc21bf	allowing nonzero initial temperature	2022-09-29 18:05:12 -07:00
Jong Wook Kim	30dc5c581b	pointer to the show and tell section	2022-09-29 14:57:49 -07:00

1 2

83 Commits