whisper

mirror of https://github.com/openai/whisper.git synced 2025-11-24 06:26:03 +00:00

Author	SHA1	Message	Date
Haden Wasserbaech	5d2a12c61e	Merge 4de997d674458de51bb84f2d45cb5f5059bcc344 into 423492dda7806206abe56bdfe427c1096473a020	2024-09-28 19:53:09 +08:00
ryanheise	ba3f3cd54b	Skip silence around hallucinations (#1838 ) * Add clip_timestamps option * Add hallucination_silence_threshold option * Fix typing for python < 3.9 --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-12-18 12:11:16 -08:00
haden	4de997d674	Improve --model argument handling and help message	2023-11-28 11:03:05 -08:00
Jong Wook Kim	c5d4256076	large-v3 (#1761 ) * mel_filters() loads 128 mel bins * can load 100-language models * large-v3 checkpoint and evals * add mandarin alias * remove unused path * flake8 fix * formatting fix	2023-11-06 10:10:30 -08:00
Marco Zucconelli	b7d277acd5	handling transcribe exceptions. (#1682 ) * handling transcribe() exceptions. * printing stacktrace --------- Co-authored-by: invalid <invalid@email.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-11-06 02:06:19 -08:00
amosal	6ed314fe41	Add new option to generate subtitles by a specific number of words (#1729 ) * ADD parser for new argument --max_words_count * ADD max_words_count in words_options ADD warning for max_line_width compatibility * ADD logic for max_words_count * rename to max_words_per_line * make them kwargs * allow specifying file path by --model * black formatting --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-11-06 01:49:33 -08:00
ryanheise	f572f2161b	Improve timestamp heuristics. (#1461 ) * Improve timestamp heuristics. * Track pauses with last_speech_timestamp	2023-06-29 16:51:24 -07:00
Valentin Berkes	248b6cb124	fix condition_on_previous_text (#1224 ) prompt_reset_since is set before all_tokens is extended hence does not have the expected effect.	2023-05-05 00:31:35 -07:00
Théo BOYER	e334ff141d	Avoid computing higher temperatures on no_speech segments (#1279 ) * Avoid computing higher temperatures on no_speech In decode_with_fallback, we compute higher temperatures in the case where compression_ratio is too high or avg_logprob is too low. But as the computation of no_speech_prob doens't depend on sampling, we can avoid computing higher temperatures if we detect in the first one that the no_speech condition is fulfilled * Update transcribe.py --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-05-04 17:02:36 -07:00
ryanheise	43940fc978	Implement max line width and max line count, and make word highlighting optional (#1184 ) * Add highlight_words, max_line_width, max_line_count * Refactor subtitle generator --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-04-10 17:28:35 -07:00
Jong Wook Kim	38f2f4d99d	fix all_tokens handling that caused more repetitions and discrepancy in JSON (#1060 )	2023-03-08 15:34:07 -08:00
Jong Wook Kim	919a713499	attempt to fix the repetition/hallucination issue identified in #1046 (#1052 ) * attempt to fix the repetition/hallucination issue identified in #1046 * zero-pad the audio instead of spectrogram * formatting fix * delete debug print	2023-03-07 20:08:45 -08:00
Jong Wook Kim	b80bcf610d	apply formatting with `black` (#1038 ) * applying black (with the default 88-column limit) * add flake8 * add isort * fix isort	2023-03-06 15:50:37 -08:00
Jong Wook Kim	500d0fe966	word-level timestamps in `transcribe()` (#869 ) * word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <ryan@ryanheise.com>	2023-03-06 14:00:49 -08:00
Jong Wook Kim	eab8d920ed	Decoding improvements (#1033 ) * suppress task tokens (transcribe/translate) * not ignoring the last segment ending with one timestamp	2023-03-06 11:32:32 -08:00
Jong Wook Kim	a6b36ede1f	drop python 3.7 support (#889 )	2023-01-24 14:05:57 -08:00
Jong Wook Kim	7f1ef223ab	handle printing even if sys.stdout.buffer is not available (#887 )	2023-01-24 10:12:04 -08:00
Niels Mayer	f5bfe004ec	Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228 ) * Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas> * for easier reading by spreadsheets importing CSV, the third column of the CSV file is delimited by quotes, and any quote characters that might be in the transcript (which would interfere with parsing the third column as a string) are converted to "''". * fix syntax error * docstring edit Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-22 00:27:17 -08:00
Aaryan YVS	da600abd2b	Added --output_format option (#333 ) * Added --output option --output option will help select the output files that will be generated. Corrected the logic, which wrongly shows progress bar when verbose is set to False * Changed output_files variable * Changed back the tqdm verbose * refactor output format handling Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-21 23:58:38 -08:00
Jong Wook Kim	12e1089462	use stdout for printing transcription progress (#867 )	2023-01-20 00:54:05 -08:00
Jong Wook Kim	9d646db9d8	print '?' if a letter can't be encoded using the system default encoding (#859 )	2023-01-17 23:28:36 -08:00
adamreis	70861c7ce3	Fix tiny transcribe() docstring typo (#857 ) s/successfully/successively, which I believe was the intent.	2023-01-16 22:42:01 -08:00
Jong Wook Kim	02aa851a49	fix to return only the text token ids	2022-11-15 16:25:11 -08:00
Jong Wook Kim	d18e9ea5dd	transcribe() on English-only model won't complain when language="en" is not given	2022-10-09 02:40:12 -07:00
eudoxos	35713c66e0	Add --threads option to transcribe (#278 ) * Add --threads option to transcribe Torch on CPU uses by default number_of_cores/2. This option allows to override this default. * Update transcribe.py Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>	2022-10-09 02:11:15 -07:00
Jibin Mathew	0b1ba3d46e	Add model_dir to arguments (#202 ) * Add model_dir to arguments * minor formatting change Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2022-09-30 14:45:51 -07:00
Jong Wook Kim	7cb4cc21bf	allowing nonzero initial temperature	2022-09-29 18:05:12 -07:00
Vicki Anand	2b0c2971af	Don't update duration if last timestamp is same as begin (#191 )	2022-09-29 12:27:48 -07:00
Jong Wook Kim	62fe7f1009	patience definition to match the paper	2022-09-27 19:00:41 -07:00
Nick Konovalchuk	b4308c4782	fix: transcribe verbosity (#140 )	2022-09-26 11:46:21 -07:00
VulumeCode	2037b65f3f	Context prompt (#128 ) Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-26 05:22:33 -07:00
EliEron	fc0f40981d	Write each sentence as a separate line for the txt output (#101 ) * Write each sentence as a separate line for the txt output Write each sentence as a separate line for the txt output * Update utils.py Co-authored-by: EliEron <example@example.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-26 04:52:28 -07:00
fatih	ead77fab97	add srt subtitle export utility (#102 ) * add srt subtitle export utility * simplifying Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-26 03:50:26 -07:00
fatih	9e7e418ff1	add progress bar for transcribe loop (#100 ) * add progress bar to transcribe loop * improved warning message for English-only models * add --condition_on_previous_text * progressbar renames Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-26 03:24:13 -07:00
Jong Wook Kim	5d8d3e75a4	add --condition_on_previous_text	2022-09-25 05:16:08 -07:00
Jong Wook Kim	2d3032de01	improved warning message for English-only models	2022-09-25 02:10:36 -07:00
Jong Wook Kim	15ab548263	nocaptions -> nospeech to match the paper figure	2022-09-23 15:45:32 +09:00
mj-kh	61989529b7	Fix possible mistake when loading model to device (#57 ) Before this change, the model is loaded into GPU regardless of the value of "device" argument in CLI. (e.g. whisper "test.wav" --device cpu loads into GPU anyway)	2022-09-23 15:21:47 +09:00
hanacchi	c85eaaae29	Use UTF-8 encoding to save the txt and vtt files (#37 ) Explicitly set the text encoding to UTF-8 in order to avoid UnicodeEncodeErrors Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-23 12:10:55 +09:00
EliEron	759e8d47a8	Fix output_dir argument when audio file is a path (#45 )	2022-09-23 11:38:37 +09:00
Jong Wook Kim	834f00a0ea	making small model the default	2022-09-22 02:45:12 +09:00
Jong Wook Kim	6e3be77e1a	initial commit	2022-09-22 01:09:43 +09:00

42 Commits