* attempt to fix the repetition/hallucination issue identified in #1046
* zero-pad the audio instead of spectrogram
* formatting fix
* delete debug print
* word-level timestamps in `transcribe()`
* moving to `timing.py`
* numba implementation for dtw, replacing dtw-python
* triton implementation for dtw
* add test for dtw implementations
* triton implementation of median_filter
* a simple word-level timestamps test
* add scipy as dev dependency
* installs an older version of Triton if CUDA < 11.4
* fix broken merge
* loosen nvcc version match regex
* find_alignment() function
* miscellaneous improvements
* skip median filtering when the input is too small
* Expose punctuation options in cli and transcribe() (#973)
* fix merge error
* fix merge error 2
* annotating that word_timestamps is experimental
---------
Co-authored-by: ryanheise <ryan@ryanheise.com>
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>
* for easier reading by spreadsheets importing CSV, the third
column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".
* fix syntax error
* docstring edit
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* Added --output option
--output option will help select the output files that will be generated.
Corrected the logic, which wrongly shows progress bar when verbose is set to False
* Changed output_files variable
* Changed back the tqdm verbose
* refactor output format handling
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* Add github action to automatically push to pypi on Release x.y.z commit
* some housekeeping for pypi upload
* add version.py
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s.
Co-authored-by: Umar Farooqi <umar@paystash.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* Update Hebrew language code to he per IANA registry
Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`
The correct subtag:
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
```
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```
* Update hebrew ISO code to he
Per discussion, it's ok to make this change without backwards compatibility
- The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large.
- It has the same architecture as the original large model.
- When `load_model("large")` is called, the "large-v2" model will be loaded.
- We will soon update the paper regarding this new model.