* ADD parser for new argument --max_words_count
* ADD max_words_count in words_options
ADD warning for max_line_width compatibility
* ADD logic for max_words_count
* rename to max_words_per_line
* make them kwargs
* allow specifying file path by --model
* black formatting
---------
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* Avoid computing higher temperatures on no_speech
In decode_with_fallback, we compute higher temperatures in the case where compression_ratio is too high or avg_logprob is too low.
But as the computation of no_speech_prob doens't depend on sampling, we can avoid computing higher temperatures if we detect in the first one that the no_speech condition is fulfilled
* Update transcribe.py
---------
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
* attempt to fix the repetition/hallucination issue identified in #1046
* zero-pad the audio instead of spectrogram
* formatting fix
* delete debug print
* word-level timestamps in `transcribe()`
* moving to `timing.py`
* numba implementation for dtw, replacing dtw-python
* triton implementation for dtw
* add test for dtw implementations
* triton implementation of median_filter
* a simple word-level timestamps test
* add scipy as dev dependency
* installs an older version of Triton if CUDA < 11.4
* fix broken merge
* loosen nvcc version match regex
* find_alignment() function
* miscellaneous improvements
* skip median filtering when the input is too small
* Expose punctuation options in cli and transcribe() (#973)
* fix merge error
* fix merge error 2
* annotating that word_timestamps is experimental
---------
Co-authored-by: ryanheise <ryan@ryanheise.com>
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>
* for easier reading by spreadsheets importing CSV, the third
column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".
* fix syntax error
* docstring edit
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* Added --output option
--output option will help select the output files that will be generated.
Corrected the logic, which wrongly shows progress bar when verbose is set to False
* Changed output_files variable
* Changed back the tqdm verbose
* refactor output format handling
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* Add --threads option to transcribe
Torch on CPU uses by default number_of_cores/2. This option allows to
override this default.
* Update transcribe.py
Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>
* Write each sentence as a separate line for the txt output
Write each sentence as a separate line for the txt output
* Update utils.py
Co-authored-by: EliEron <example@example.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* add progress bar to transcribe loop
* improved warning message for English-only models
* add --condition_on_previous_text
* progressbar renames
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
Before this change, the model is loaded into GPU regardless of the value of "device" argument in CLI.
(e.g. whisper "test.wav" --device cpu loads into GPU anyway)