* word-level timestamps in `transcribe()`
* moving to `timing.py`
* numba implementation for dtw, replacing dtw-python
* triton implementation for dtw
* add test for dtw implementations
* triton implementation of median_filter
* a simple word-level timestamps test
* add scipy as dev dependency
* installs an older version of Triton if CUDA < 11.4
* fix broken merge
* loosen nvcc version match regex
* find_alignment() function
* miscellaneous improvements
* skip median filtering when the input is too small
* Expose punctuation options in cli and transcribe() (#973)
* fix merge error
* fix merge error 2
* annotating that word_timestamps is experimental
---------
Co-authored-by: ryanheise <ryan@ryanheise.com>
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>
* for easier reading by spreadsheets importing CSV, the third
column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".
* fix syntax error
* docstring edit
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* Added --output option
--output option will help select the output files that will be generated.
Corrected the logic, which wrongly shows progress bar when verbose is set to False
* Changed output_files variable
* Changed back the tqdm verbose
* refactor output format handling
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* Add github action to automatically push to pypi on Release x.y.z commit
* some housekeeping for pypi upload
* add version.py
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s.
Co-authored-by: Umar Farooqi <umar@paystash.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
* Update Hebrew language code to he per IANA registry
Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`
The correct subtag:
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
```
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```
* Update hebrew ISO code to he
Per discussion, it's ok to make this change without backwards compatibility
- The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large.
- It has the same architecture as the original large model.
- When `load_model("large")` is called, the "large-v2" model will be loaded.
- We will soon update the paper regarding this new model.
Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
* Add --threads option to transcribe
Torch on CPU uses by default number_of_cores/2. This option allows to
override this default.
* Update transcribe.py
Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>
* Use two-digit hours in WebVTT timestamps
Per the WebVTT specification [0]:
> A WebVTT timestamp consists of the following components, in the given
> order:
>
> 1. Optionally (required if hours is non-zero):
> 1. Two or more ASCII digits, representing the hours as a base ten
> integer.
> 2. A U+003A COLON character (:)
YouTube won’t accept timestamps containing single-digit hours.
[0] https://www.w3.org/TR/webvtt1/#webvtt-timestamp
* Strip segment text in WebVTT output
We already do this for plain text and SubRip output, so we should do it
for WebVTT too.