95 Commits

Author SHA1 Message Date
Jong Wook Kim
ad3250a846 Release 20230308 v20230308 2023-03-08 15:48:57 -08:00
Jong Wook Kim
c4b50c0824
kwargs in decode() for convenience (#1061)
* kwargs in decode() for convenience

* formatting fix
2023-03-08 15:46:38 -08:00
Jong Wook Kim
38f2f4d99d
fix all_tokens handling that caused more repetitions and discrepancy in JSON (#1060) 2023-03-08 15:34:07 -08:00
Jong Wook Kim
aac47c9834 fix typo 2023-03-07 20:43:49 -08:00
Jong Wook Kim
26807ec6d3 Release 20230307 v20230307 2023-03-07 20:36:29 -08:00
Jong Wook Kim
919a713499
attempt to fix the repetition/hallucination issue identified in #1046 (#1052)
* attempt to fix the repetition/hallucination issue identified in #1046

* zero-pad the audio instead of spectrogram

* formatting fix

* delete debug print
2023-03-07 20:08:45 -08:00
Jong Wook Kim
38e990d853
Use triton==2.0.0 (#1053) 2023-03-07 16:56:31 -08:00
Jong Wook Kim
924e1f8e06
Try installing triton only if linux & x86_64 (#1051) 2023-03-07 11:31:40 -08:00
Jong Wook Kim
4b0d5e58d0
Update setup.py 2023-03-07 04:47:46 -08:00
Jong Wook Kim
8180fde939 Release 20230306 v20230306 2023-03-06 18:53:04 -08:00
Local State
c6e4e5efb3
remove auxiliary audio extension (#1021)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-03-06 17:48:14 -08:00
Jong Wook Kim
b80bcf610d
apply formatting with black (#1038)
* applying black (with the default 88-column limit)

* add flake8

* add isort

* fix isort
2023-03-06 15:50:37 -08:00
Jong Wook Kim
500d0fe966
word-level timestamps in transcribe() (#869)
* word-level timestamps in `transcribe()`

* moving to `timing.py`

* numba implementation for dtw, replacing dtw-python

* triton implementation for dtw

* add test for dtw implementations

* triton implementation of median_filter

* a simple word-level timestamps test

* add scipy as dev dependency

* installs an older version of Triton if CUDA < 11.4

* fix broken merge

* loosen nvcc version match regex

* find_alignment() function

* miscellaneous improvements

* skip median filtering when the input is too small

* Expose punctuation options in cli and transcribe() (#973)

* fix merge error

* fix merge error 2

* annotating that word_timestamps is experimental

---------

Co-authored-by: ryanheise <ryan@ryanheise.com>
2023-03-06 14:00:49 -08:00
Jong Wook Kim
eab8d920ed
Decoding improvements (#1033)
* suppress task tokens (transcribe/translate)

* not ignoring the last segment ending with one timestamp
2023-03-06 11:32:32 -08:00
Roman Vasilenko
3e1780fd37
Update README.md (#894)
Fixed a few typos and made general improvements for clarity.

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-03-03 16:41:59 -08:00
Andrey Chernykh
7858aa9c08
Fix infinite loop caused by incorrect timestamp tokens prediction (#914)
* Fix infinite loop caused by incorrect timestamp tokens prediction

https://github.com/openai/whisper/discussions/810

* Update decoding.py

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-02-01 15:46:51 -08:00
Jong Wook Kim
5c1a8c10e7
clarify that 3.11 is not supported 2023-01-27 00:01:49 -08:00
Jong Wook Kim
4e635c6644
Update README.md about Python 3.8+ requirement 2023-01-24 14:45:56 -08:00
Jong Wook Kim
a6b36ede1f
drop python 3.7 support (#889) 2023-01-24 14:05:57 -08:00
Jong Wook Kim
55f690af79 Release 20230124 v20230124 2023-01-24 11:11:08 -08:00
Jong Wook Kim
7f1ef223ab
handle printing even if sys.stdout.buffer is not available (#887) 2023-01-24 10:12:04 -08:00
Niels Mayer
f5bfe004ec
Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228)
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>

* for easier reading by spreadsheets importing CSV, the third

column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".

* fix syntax error

* docstring edit

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-22 00:27:17 -08:00
Aaryan YVS
da600abd2b
Added --output_format option (#333)
* Added --output option

--output option will help select the output files that will be generated.

Corrected the logic, which wrongly shows progress bar when verbose is set to False

* Changed output_files variable

* Changed back the tqdm verbose

* refactor output format handling

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-21 23:58:38 -08:00
zer0-x
9f7aba6099
Handle XDG_CACHE_HOME properly for download_root (#864)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-01-21 01:09:39 -08:00
Jong Wook Kim
12e1089462
use stdout for printing transcription progress (#867) 2023-01-20 00:54:05 -08:00
Markus Hennerbichler
ea1c266709
Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659)
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-18 10:41:11 -08:00
Jong Wook Kim
8135a7c31c verbose outputs from pytest 2023-01-18 10:30:18 -08:00
Jong Wook Kim
9d646db9d8
print '?' if a letter can't be encoded using the system default encoding (#859) 2023-01-17 23:28:36 -08:00
Jong Wook Kim
37a4f1be6d Release 20230117 v20230117 2023-01-17 16:08:28 -08:00
Romain Beaumont
b9f9b433ae
Add github action to automatically push to pypi on Release x.y.z commit (#681)
* Add github action to automatically push to pypi on Release x.y.z commit

* some housekeeping for pypi upload

* add version.py

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-17 15:50:26 -08:00
Umar Farooqi
f0083e7eb2
Use ndimage.median_filter instead of signal.medfilter (#812)
For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s.

Co-authored-by: Umar Farooqi <umar@paystash.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-17 14:43:05 -08:00
Jong Wook Kim
a84191faae rename GitHub workflow 2023-01-17 13:54:40 -08:00
Jong Wook Kim
b1d213c0c7 allow test_transcribe to run on CPU when CUDA is not available 2023-01-17 13:43:36 -08:00
Jong Wook Kim
493dfffa37 add github action to run pytest 2023-01-17 13:38:33 -08:00
Mikko Vedru
0f39c89d92
Update README.md (#804) 2023-01-16 23:46:42 -08:00
Markus Hennerbichler
6df3ea1fb5
Support batch-dimension in log_mel_spectogram (#839) 2023-01-16 23:46:15 -08:00
adamreis
70861c7ce3
Fix tiny transcribe() docstring typo (#857)
s/successfully/successively, which I believe was the intent.
2023-01-16 22:42:01 -08:00
Jong Wook Kim
f82bc59f5e torch.concatenate -> torch.cat for compatibility 2023-01-10 10:53:18 -08:00
Jong Wook Kim
28769fcfe5 word-level timestamps in Multilingual_ASR notebook 2022-12-31 10:03:42 -07:00
Jong Wook Kim
53807677fe MultiHeadAttention to return qk as well 2022-12-30 01:53:57 -07:00
Jong Wook Kim
9323b2526c Revert "saving the qk matrix in the attention module for convenience"
This reverts commit 68e44bd83ce6c3e352f74b266aa39d8b649af9e3.
2022-12-29 23:53:31 -07:00
Jong Wook Kim
68e44bd83c saving the qk matrix in the attention module for convenience 2022-12-29 23:02:52 -07:00
Jong Wook Kim
0b5dcfdef7 large-v2 figure and arxiv url update 2022-12-09 00:12:39 -05:00
altryne
b9265e5796
Update Hebrew language code to he per IANA registry (#401)
* Update Hebrew language code to he per IANA registry

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```

* Update hebrew ISO code to he

Per discussion, it's ok to make this change without backwards compatibility
2022-12-07 13:45:31 -05:00
Paul Harter
fd8f80c8b8
Explicitly closing model file after reading it (#630) 2022-12-06 12:07:19 -05:00
Jong Wook Kim
4179ed2475 add large-v2 model
- The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large.
- It has the same architecture as the original large model.
- When `load_model("large")` is called, the "large-v2" model will be loaded.
- We will soon update the paper regarding this new model.
2022-12-05 11:07:14 -05:00
jumon
ec1b34bb90
fix compression ratio function (#561) 2022-12-04 17:27:42 -06:00
Jong Wook Kim
eff383b27b invoking __call__ instead of forward() 2022-11-16 04:18:50 -08:00
Jong Wook Kim
02aa851a49 fix to return only the text token ids 2022-11-15 16:25:11 -08:00
jumon
76148a56c5
suppress generating non-timestamp tokens at the beginning (#532) 2022-11-15 11:44:36 -08:00