42 Commits

Author SHA1 Message Date
Haden Wasserbaech
5d2a12c61e
Merge 4de997d674458de51bb84f2d45cb5f5059bcc344 into 423492dda7806206abe56bdfe427c1096473a020 2024-09-28 19:53:09 +08:00
ryanheise
ba3f3cd54b
Skip silence around hallucinations (#1838)
* Add clip_timestamps option

* Add hallucination_silence_threshold option

* Fix typing for python < 3.9

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-12-18 12:11:16 -08:00
haden
4de997d674 Improve --model argument handling and help message 2023-11-28 11:03:05 -08:00
Jong Wook Kim
c5d4256076
large-v3 (#1761)
* mel_filters() loads 128 mel bins

* can load 100-language models

* large-v3 checkpoint and evals

* add mandarin alias

* remove unused path

* flake8 fix

* formatting fix
2023-11-06 10:10:30 -08:00
Marco Zucconelli
b7d277acd5
handling transcribe exceptions. (#1682)
* handling transcribe() exceptions.

* printing stacktrace

---------

Co-authored-by: invalid <invalid@email.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-11-06 02:06:19 -08:00
amosal
6ed314fe41
Add new option to generate subtitles by a specific number of words (#1729)
* ADD parser for new argument --max_words_count

* ADD max_words_count in words_options
ADD warning for max_line_width compatibility

* ADD logic for max_words_count

* rename to max_words_per_line

* make them kwargs

* allow specifying file path by --model

* black formatting

---------

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-11-06 01:49:33 -08:00
ryanheise
f572f2161b
Improve timestamp heuristics. (#1461)
* Improve timestamp heuristics.

* Track pauses with last_speech_timestamp
2023-06-29 16:51:24 -07:00
Valentin Berkes
248b6cb124
fix condition_on_previous_text (#1224)
prompt_reset_since is set before all_tokens is extended hence does not have the expected effect.
2023-05-05 00:31:35 -07:00
Théo BOYER
e334ff141d
Avoid computing higher temperatures on no_speech segments (#1279)
* Avoid computing higher temperatures on no_speech

In decode_with_fallback, we compute higher temperatures in the case where compression_ratio is too high or avg_logprob is too low.
But as the computation of no_speech_prob doens't depend on sampling, we can avoid computing higher temperatures if we detect in the first one that the no_speech condition is fulfilled

* Update transcribe.py

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-05-04 17:02:36 -07:00
ryanheise
43940fc978
Implement max line width and max line count, and make word highlighting optional (#1184)
* Add highlight_words, max_line_width, max_line_count

* Refactor subtitle generator

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-04-10 17:28:35 -07:00
Jong Wook Kim
38f2f4d99d
fix all_tokens handling that caused more repetitions and discrepancy in JSON (#1060) 2023-03-08 15:34:07 -08:00
Jong Wook Kim
919a713499
attempt to fix the repetition/hallucination issue identified in #1046 (#1052)
* attempt to fix the repetition/hallucination issue identified in #1046

* zero-pad the audio instead of spectrogram

* formatting fix

* delete debug print
2023-03-07 20:08:45 -08:00
Jong Wook Kim
b80bcf610d
apply formatting with black (#1038)
* applying black (with the default 88-column limit)

* add flake8

* add isort

* fix isort
2023-03-06 15:50:37 -08:00
Jong Wook Kim
500d0fe966
word-level timestamps in transcribe() (#869)
* word-level timestamps in `transcribe()`

* moving to `timing.py`

* numba implementation for dtw, replacing dtw-python

* triton implementation for dtw

* add test for dtw implementations

* triton implementation of median_filter

* a simple word-level timestamps test

* add scipy as dev dependency

* installs an older version of Triton if CUDA < 11.4

* fix broken merge

* loosen nvcc version match regex

* find_alignment() function

* miscellaneous improvements

* skip median filtering when the input is too small

* Expose punctuation options in cli and transcribe() (#973)

* fix merge error

* fix merge error 2

* annotating that word_timestamps is experimental

---------

Co-authored-by: ryanheise <ryan@ryanheise.com>
2023-03-06 14:00:49 -08:00
Jong Wook Kim
eab8d920ed
Decoding improvements (#1033)
* suppress task tokens (transcribe/translate)

* not ignoring the last segment ending with one timestamp
2023-03-06 11:32:32 -08:00
Jong Wook Kim
a6b36ede1f
drop python 3.7 support (#889) 2023-01-24 14:05:57 -08:00
Jong Wook Kim
7f1ef223ab
handle printing even if sys.stdout.buffer is not available (#887) 2023-01-24 10:12:04 -08:00
Niels Mayer
f5bfe004ec
Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228)
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>

* for easier reading by spreadsheets importing CSV, the third

column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".

* fix syntax error

* docstring edit

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-22 00:27:17 -08:00
Aaryan YVS
da600abd2b
Added --output_format option (#333)
* Added --output option

--output option will help select the output files that will be generated.

Corrected the logic, which wrongly shows progress bar when verbose is set to False

* Changed output_files variable

* Changed back the tqdm verbose

* refactor output format handling

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-21 23:58:38 -08:00
Jong Wook Kim
12e1089462
use stdout for printing transcription progress (#867) 2023-01-20 00:54:05 -08:00
Jong Wook Kim
9d646db9d8
print '?' if a letter can't be encoded using the system default encoding (#859) 2023-01-17 23:28:36 -08:00
adamreis
70861c7ce3
Fix tiny transcribe() docstring typo (#857)
s/successfully/successively, which I believe was the intent.
2023-01-16 22:42:01 -08:00
Jong Wook Kim
02aa851a49 fix to return only the text token ids 2022-11-15 16:25:11 -08:00
Jong Wook Kim
d18e9ea5dd transcribe() on English-only model won't complain when language="en" is not given 2022-10-09 02:40:12 -07:00
eudoxos
35713c66e0
Add --threads option to transcribe (#278)
* Add --threads option to transcribe

Torch on CPU uses by default number_of_cores/2. This option allows to
override this default.

* Update transcribe.py

Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>
2022-10-09 02:11:15 -07:00
Jibin Mathew
0b1ba3d46e
Add model_dir to arguments (#202)
* Add model_dir to arguments

* minor formatting change

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2022-09-30 14:45:51 -07:00
Jong Wook Kim
7cb4cc21bf allowing nonzero initial temperature 2022-09-29 18:05:12 -07:00
Vicki Anand
2b0c2971af
Don't update duration if last timestamp is same as begin (#191) 2022-09-29 12:27:48 -07:00
Jong Wook Kim
62fe7f1009 patience definition to match the paper 2022-09-27 19:00:41 -07:00
Nick Konovalchuk
b4308c4782
fix: transcribe verbosity (#140) 2022-09-26 11:46:21 -07:00
VulumeCode
2037b65f3f
Context prompt (#128)
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 05:22:33 -07:00
EliEron
fc0f40981d
Write each sentence as a separate line for the txt output (#101)
* Write each sentence as a separate line for the txt output

Write each sentence as a separate line for the txt output

* Update utils.py

Co-authored-by: EliEron <example@example.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 04:52:28 -07:00
fatih
ead77fab97
add srt subtitle export utility (#102)
* add srt subtitle export utility

* simplifying

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 03:50:26 -07:00
fatih
9e7e418ff1
add progress bar for transcribe loop (#100)
* add progress bar to transcribe loop

* improved warning message for English-only models

* add --condition_on_previous_text

* progressbar renames

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 03:24:13 -07:00
Jong Wook Kim
5d8d3e75a4 add --condition_on_previous_text 2022-09-25 05:16:08 -07:00
Jong Wook Kim
2d3032de01 improved warning message for English-only models 2022-09-25 02:10:36 -07:00
Jong Wook Kim
15ab548263 nocaptions -> nospeech to match the paper figure 2022-09-23 15:45:32 +09:00
mj-kh
61989529b7
Fix possible mistake when loading model to device (#57)
Before this change, the model is loaded into GPU regardless of the value of "device" argument in CLI.

(e.g. whisper "test.wav" --device cpu loads into GPU anyway)
2022-09-23 15:21:47 +09:00
hanacchi
c85eaaae29
Use UTF-8 encoding to save the txt and vtt files (#37)
Explicitly set the text encoding to UTF-8 in order to avoid UnicodeEncodeErrors

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-23 12:10:55 +09:00
EliEron
759e8d47a8
Fix output_dir argument when audio file is a path (#45) 2022-09-23 11:38:37 +09:00
Jong Wook Kim
834f00a0ea making small model the default 2022-09-22 02:45:12 +09:00
Jong Wook Kim
6e3be77e1a initial commit 2022-09-22 01:09:43 +09:00