95 Commits

Author SHA1 Message Date
Vicki Anand
9f70a352f9
Fix attention caching to make it actually work (#370) 2022-10-19 16:44:03 -07:00
Sumana Harihareswara
7f3e408e09
Add package metadata to setup.py (#315)
Add project summary, license, etc. for display with
"pip show" and similar Python package distribution tools.
2022-10-17 13:51:16 -07:00
Michael Monashev
f680570016
Fix bug (#305)
Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
2022-10-17 11:38:20 -07:00
Jong Wook Kim
d18e9ea5dd transcribe() on English-only model won't complain when language="en" is not given 2022-10-09 02:40:12 -07:00
David Marx
82725cea9c
infer download_root from XDG_CACHE_HOME if avail (#257) 2022-10-09 02:14:03 -07:00
eudoxos
35713c66e0
Add --threads option to transcribe (#278)
* Add --threads option to transcribe

Torch on CPU uses by default number_of_cores/2. This option allows to
override this default.

* Update transcribe.py

Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>
2022-10-09 02:11:15 -07:00
Corentin Jemine
9e653bd0ea
Fixed CoW RuntimeError in DecodingTask.run() (#240) 2022-10-04 08:49:31 -07:00
Tom Stuart
02b74308ff
Fix timestamps and strip extraneous whitespace in WebVTT output (#219)
* Use two-digit hours in WebVTT timestamps

Per the WebVTT specification [0]:

> A WebVTT timestamp consists of the following components, in the given
> order:
>
> 1. Optionally (required if hours is non-zero):
>   1. Two or more ASCII digits, representing the hours as a base ten
>      integer.
>   2. A U+003A COLON character (:)

YouTube won’t accept timestamps containing single-digit hours.

[0] https://www.w3.org/TR/webvtt1/#webvtt-timestamp

* Strip segment text in WebVTT output

We already do this for plain text and SubRip output, so we should do it
for WebVTT too.
2022-10-03 14:51:07 -07:00
Jibin Mathew
0b1ba3d46e
Add model_dir to arguments (#202)
* Add model_dir to arguments

* minor formatting change

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2022-09-30 14:45:51 -07:00
Caleb McQuillin
60132ade70
Use , character instead of . for SRT output. (#197)
The SRT format uses the decimal comma character as the fractional separator rather than the decimal point character. Adjust format_timestamp and write_srt to specify the separator character.

See https://en.wikipedia.org/wiki/SubRip#:~:text=the%20fractional%20separator%20used%20is%20the%20comma%2C%20since%20the%20program%20was%20written%20in%20france.
2022-09-29 20:44:12 -07:00
Jong Wook Kim
7cb4cc21bf allowing nonzero initial temperature 2022-09-29 18:05:12 -07:00
Jong Wook Kim
30dc5c581b pointer to the show and tell section 2022-09-29 14:57:49 -07:00
Szabolcs Pasztor
5905e503b8
Update README.md (#161)
* Update README.md

* merging paragraphs

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-29 14:18:54 -07:00
Fabiano
0457aac342
Adds missing command for install (mac) (#90)
* Adds missing command for install (mac)

Required for users who didn't previously have Rust installed.

* minor wording change

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-29 14:08:58 -07:00
sawadata
deafef05f3
Update audio.py (#178)
add '-nostdin' argument
2022-09-29 12:34:04 -07:00
Vicki Anand
2b0c2971af
Don't update duration if last timestamp is same as begin (#191) 2022-09-29 12:27:48 -07:00
Jong Wook Kim
62fe7f1009 patience definition to match the paper 2022-09-27 19:00:41 -07:00
Nick Konovalchuk
b4308c4782
fix: transcribe verbosity (#140) 2022-09-26 11:46:21 -07:00
Michael Goin
9c8183a179
Use PyTorch as logits transpose for ONNX support (#141) 2022-09-26 10:54:26 -07:00
VulumeCode
2037b65f3f
Context prompt (#128)
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 05:22:33 -07:00
EliEron
fc0f40981d
Write each sentence as a separate line for the txt output (#101)
* Write each sentence as a separate line for the txt output

Write each sentence as a separate line for the txt output

* Update utils.py

Co-authored-by: EliEron <example@example.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 04:52:28 -07:00
VulumeCode
520796a34c
fix token suppression (#123) 2022-09-26 04:35:21 -07:00
fatih
ead77fab97
add srt subtitle export utility (#102)
* add srt subtitle export utility

* simplifying

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 03:50:26 -07:00
Ashutosh Tripathi
5485428c81
arch linux ffmpeg install (#93) 2022-09-26 03:24:47 -07:00
fatih
9e7e418ff1
add progress bar for transcribe loop (#100)
* add progress bar to transcribe loop

* improved warning message for English-only models

* add --condition_on_previous_text

* progressbar renames

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 03:24:13 -07:00
Jong Wook Kim
5d8d3e75a4 add --condition_on_previous_text 2022-09-25 05:16:08 -07:00
Jong Wook Kim
2d3032de01 improved warning message for English-only models 2022-09-25 02:10:36 -07:00
Jong Wook Kim
8cf36f3508 allow hyphens and single quotes between words 2022-09-23 20:11:27 +09:00
Jong Wook Kim
15ab548263 nocaptions -> nospeech to match the paper figure 2022-09-23 15:45:32 +09:00
mj-kh
61989529b7
Fix possible mistake when loading model to device (#57)
Before this change, the model is loaded into GPU regardless of the value of "device" argument in CLI.

(e.g. whisper "test.wav" --device cpu loads into GPU anyway)
2022-09-23 15:21:47 +09:00
Niklas K
f296bcd3fa
Avoid keeping redundant copies of model weights in memory during load (#42)
* don't keep copies of model weights in host memory

* adding type annotation

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-23 12:57:39 +09:00
Sidney Radcliffe
a4fe05aa71
Add conda environment.yml (and fix requirements.txt) (#8)
* fix: more-itertools name in requirements.txt

* feature: minimal environment.yml for conda

* Revert "feature: minimal environment.yml for conda"

This reverts commit 8fd7438b368b0eb5df85f667fea911f293fa5e6d.

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-23 12:30:45 +09:00
Giovanni Lanzani
957ffc77de
Add rust as a dependency (#30)
* Add rust as a dependency

* Update README.md

Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>
2022-09-23 12:26:38 +09:00
Ram Rachum
59f543e218
Fix exception cause in audio.py (#33) 2022-09-23 12:12:37 +09:00
hanacchi
c85eaaae29
Use UTF-8 encoding to save the txt and vtt files (#37)
Explicitly set the text encoding to UTF-8 in order to avoid UnicodeEncodeErrors

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-23 12:10:55 +09:00
EliEron
759e8d47a8
Fix output_dir argument when audio file is a path (#45) 2022-09-23 11:38:37 +09:00
Micheal Taylor
c0607e8d22
Add scoop install for windows (#48)
Adding scoop install to setup for windows for ffmpeg
2022-09-23 11:37:57 +09:00
Jong Wook Kim
e90b8fa7e8
Merge pull request #14 from bquast/patch-1
make LICENSE a link instead of code-formatted text
2022-09-22 11:51:05 +09:00
Jong Wook Kim
f83cb83a42
Merge pull request #24 from ldanilov/patch-1
fixes the link to the model paper
2022-09-22 11:48:57 +09:00
Lev Danilov
45fc3d43c1
fixes the link to the model paper 2022-09-21 21:25:17 -04:00
Bastiaan Quast
08a739ad79
make LICENSE a link instead of code-formatted text 2022-09-21 23:17:02 +02:00
Jong Wook Kim
49a3ffc997 add section Available models and languages 2022-09-22 05:36:25 +09:00
Jong Wook Kim
cfd6bdda21 a note on speed-accuracy tradeoffs 2022-09-22 02:58:56 +09:00
Jong Wook Kim
834f00a0ea making small model the default 2022-09-22 02:45:12 +09:00
Jong Wook Kim
6e3be77e1a initial commit 2022-09-22 01:09:43 +09:00