whisper

mirror of https://github.com/openai/whisper.git synced 2025-11-28 00:03:40 +00:00

Author	SHA1	Message	Date
Claude	22ddbf4796	feat: Create React web application with Figma design and Flask backend Frontend: - Initialize React 18 + TypeScript project with Vite - Implement complete App.tsx matching Figma design - Add dark/light theme toggle support - Create file queue management UI - Implement search with text highlighting - Add segment copy functionality - Create reusable UI components (Button, Progress, Input, Select) - Configure Tailwind CSS v4.0 for styling - Setup window resizing functionality - Implement RTL support for Farsi text Backend: - Create Flask API server with CORS support - Implement /transcribe endpoint for audio/video processing - Add /models endpoint for available models info - Implement /export endpoint for multiple formats (TXT, SRT, VTT, JSON) - Setup Whisper model integration - Handle file uploads with validation - Format transcription results with timestamps Configuration: - Setup Vite dev server with API proxy - Configure Tailwind CSS with custom colors - Setup TypeScript strict mode - Add PostCSS with autoprefixer - Configure Flask for development Documentation: - Write comprehensive README with setup instructions - Include API endpoint documentation - Add troubleshooting guide - Include performance tips Includes everything ready to run with: npm install && npm run dev (frontend) and python backend/app.py (backend)	2025-11-13 08:03:09 +00:00
Claude	efdcf42ffd	feat: Add comprehensive configuration and documentation - Create config.py with model, device, and format settings - Add model descriptions and performance information - Expand README with detailed installation instructions - Add troubleshooting section for common issues - Include advanced usage examples - Document all export formats and features - Add performance tips and recommendations - Phase 6 complete: Full configuration and documentation ready	2025-11-12 05:13:35 +00:00
Claude	72ab2e3fa9	feat: Add professional styling and theming - Create styles.py module with comprehensive stylesheet - Implement color palette and typography configuration - Apply consistent styling across all UI elements - Improve button, text input, and progress bar appearance - Use monospace font for transcription results display - Add hover and active states for interactive elements - Phase 5 complete: Professional UI styling applied	2025-11-12 05:12:38 +00:00
Claude	dd57adab18	feat: Implement comprehensive export functionality - Create TranscriptionExporter utility supporting TXT, SRT, VTT, JSON, TSV formats - Implement proper timestamp formatting for subtitle formats - Update GUI export dialog with all supported formats - Integrate exporter with main window - Add robust error handling for export operations - Phase 4 complete: Full export capabilities ready	2025-11-12 05:12:06 +00:00
Claude	3fa194fa1f	feat: Implement Whisper integration for Farsi transcription - Create FarsiTranscriber class wrapping OpenAI's Whisper model - Support both audio and video file formats - Implement word-level timestamp extraction - Add device detection (CUDA/CPU) for optimal performance - Format results for display with timestamps - Integrate transcriber with PyQt6 worker thread - Add error handling and progress updates - Phase 3 complete: Core transcription engine ready	2025-11-12 05:11:31 +00:00
Claude	0cc07b98e3	feat: Create PyQt6 GUI with file picker and results display - Implement MainWindow class with professional layout - Add file picker for audio and video formats - Create transcription button with threading support - Add progress bar and status indicators - Implement TranscriptionWorker thread to prevent UI freezing - Add results display with timestamps support - Create export button (placeholder for Phase 4) - Add error handling and user feedback - Phase 2 complete: Full GUI scaffolding ready	2025-11-12 05:10:53 +00:00
Claude	86b2a93dee	feat: Initialize Farsi Transcriber application structure - Create project directories (ui, models, utils) - Add PyQt6 environment setup with requirements.txt - Create main entry point (main.py) - Add comprehensive README with setup instructions - Add .gitignore for Python, PyTorch, and ML artifacts - Phase 1 complete: project structure and environment ready	2025-11-12 05:09:15 +00:00
Jong Wook Kim	c0d2f624c0	Release 20250625	2025-06-25 18:05:47 -07:00
Jong Wook Kim	db7fbc75fe	Release 20250625	2025-06-25 18:03:25 -07:00
Jong Wook Kim	31243bad24	Release 20250625 v20250625	2025-06-25 18:00:48 -07:00
Dridi Yassin	1f8fc975d3	Fix: Update torch.load to use weights_only=True to prevent security w… (#2451 ) * Fix: Update torch.load to use weights_only=True to prevent security warning * Update __init__.py * Update __init__.py --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2025-06-25 17:54:30 -07:00
Nathan Harmon	679ae1d141	Fix: Ensure DTW cost tensor is on the same device as input tensor (#2561 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2025-06-25 17:42:09 -07:00
Nicholas Nadeau, Ph.D., P.Eng.	f50c4f264e	docs: updated README to specify translation model limitation (#2547 ) Updated README given info from https://github.com/openai/whisper/discussions/2483	2025-06-25 17:03:47 -07:00
ExtReMLapin	86899243e9	Fixed triton kernel update to support latest triton versions (#2588 ) * Update triton kernel using _unsafe_update_src * support old triton versions * refactored changes to update triton kernel only once * Update triton_ops.py --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>	2025-06-25 17:02:54 -07:00
Learpcs	5dff4db81a	Fix: GitHub display errors for Jupyter notebooks (#2589 ) * Update LibriSpeech.ipynb Update LibriSpeech.ipynb * Update Multilingual_ASR.ipynb	2025-06-25 16:55:15 -07:00
dependabot[bot]	dd985ac4b9	Bump the github-actions group with 3 updates (#2592 ) Bumps the github-actions group with 3 updates: [actions/checkout](https://github.com/actions/checkout), [actions/setup-python](https://github.com/actions/setup-python) and [softprops/action-gh-release](https://github.com/softprops/action-gh-release). Updates `actions/checkout` from 3 to 4 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v3...v4) Updates `actions/setup-python` from 4 to 5 - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v4...v5) Updates `softprops/action-gh-release` from 1 to 2 - [Release notes](https://github.com/softprops/action-gh-release/releases) - [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md) - [Commits](https://github.com/softprops/action-gh-release/compare/v1...v2) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/setup-python dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: softprops/action-gh-release dependency-version: '2' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-05-13 11:22:31 -07:00
Christian Clauss	e1e6aa60ff	Keep GitHub Actions up to date with GitHub's Dependabot (#2486 ) Automates the creation of pull requests like * #2430 * [Keeping your actions up to date with Dependabot](https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot) * [Configuration options for the dependabot.yml file - package-ecosystem](https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#package-ecosystem)	2025-05-13 11:10:43 -07:00
Christian Clauss	e6a5fc0ff0	pre-commit: Upgrade black v25.1.0 and isort v6.0.0 (#2514 )	2025-05-13 09:43:34 -07:00
Christian Clauss	13907bed90	GitHub Actions: Add Python 3.13 to the testing (#2487 ) * GitHub Actions: Add Python 3.13 to the testing * GitHub Actions: Add Python 3.13 to the testing * numba==0.61.0rc2; python_version=='3.13' * triton>=2; python_version<'3.13' * fail-fast: false * Numba v0.61.0 is released https://github.com/numba/numba/releases * Update pyproject.toml	2025-05-12 21:10:40 -07:00
Jong Wook Kim	517a43ecd1	Update python-publish.yml using `-m build --sdist` instead of `setup.py sdist`	2025-01-04 12:56:16 -08:00
Christian Clauss	dd4d010d2c	PEP 621: Migrate from setup.py to pyproject.toml (#2435 )	2025-01-04 01:38:35 -08:00
Christian Clauss	26a7cacc83	pre-commit autoupdate && pre-commit run --all-files (#2484 ) * pre-commit autoupdate && pre-commit run --all-files * Black formatter needs a current version of Python	2025-01-04 01:02:18 -08:00
Christian Clauss	6c1d8f1ea1	Upgrade GitHub Actions (#2430 )	2025-01-04 00:47:12 -08:00
Purfview	90db0de189	Bugfix: Illogical "Avoid computing higher temperatures on no_speech" (#1903 ) * Bugfix: Illogical "Avoid computing higher temperatures on no_speech" Bugfix for https://github.com/openai/whisper/pull/1279 It's "silence" when decoding has failed due to `compression_ratio_threshold` too, when further down the code it's not "silence" anymore. "Silence" should be only when decoding has failed due to `logprob_threshold`. Like described there: `8bc8860694/whisper/transcribe.py (L421)` And in code there: `8bc8860694/whisper/transcribe.py (L243-L251)` * Fix if "logprob_threshold=None" --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2024-11-30 21:47:01 -08:00
Lowell Vaughn	fc5ded7d90	Updating README and doc strings to reflect that n_mels can now be 128 (#2049 )	2024-11-26 09:37:01 -08:00
f1sh	173ff7dd1d	fix typo data/README.md (#2433 )	2024-11-12 16:35:54 -08:00
BotMaster3000	271445b2f2	Update README.md (#2379 ) Default now uses Turbo instead of Small	2024-11-03 23:00:30 -08:00
kittsil	5979f03701	Add option to carry initial_prompt with the sliding window (#2343 ) * Add option to carry initial_prompt with the sliding window Add an option `carry_initial_prompt = False` to `whisper.transcribe()`. When set to `True`, `initial_prompt` is prepended to each internal `decode()` call's `prompt`. If there is not enough context space at the start of the prompt, the prompt is left-sliced to make space. * Prevent redundant initial_prompt_tokens * Revert unnecessary .gitignore change --------- Co-authored-by: Kittsil <kittsil@gmail.com> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2024-10-26 07:17:31 -07:00
Jong Wook Kim	cdb8147962	more pytorch versions in tests (#2408 )	2024-10-25 17:30:02 -07:00
Jong Wook Kim	25639fc17d	Release 20240930 v20240930	2024-09-30 11:20:53 -07:00
Jong Wook Kim	260bbcfcb3	allowing numpy 2 in tests (#2362 ) * allowing numpy 2 in tests * allowing numpy 2 in tests	2024-09-30 11:18:17 -07:00
Jong Wook Kim	25e5c364e0	large-v3-turbo model (#2361 )	2024-09-30 10:59:51 -07:00
Jong Wook Kim	b66b46f32d	test on python/pytorch versions up to 3.12 and 2.4.1 (#2360 )	2024-09-30 10:33:56 -07:00
Jong Wook Kim	27f971320a	using sdpa if available (#2359 ) * using sdpa if available * Update model.py	2024-09-30 10:27:14 -07:00
Jong Wook Kim	423492dda7	Release 20240927 v20240927	2024-09-27 16:43:58 -07:00
Jong Wook Kim	279133e310	pinning numpy<2 in tests (#2332 ) * pinning numpy<2 in tests * pip install together * pip install together	2024-09-10 10:43:21 -07:00
Jianan Xing	32d55d5d76	Relax triton requirements for compatibility with pytorch 2.4 and newer (#2307 ) * Relax triton requirements for compatibility with pytorch 2.4 and newer Similar to https://github.com/openai/whisper/pull/1802, but now when pytorch upgrades to 2.4, it requires triton==3.0.0. I am not sure if it makes sense to remove the upper bound version constraints * Update requirements.txt	2024-09-10 09:53:08 -07:00
ryanheise	ba3f3cd54b	Skip silence around hallucinations (#1838 ) * Add clip_timestamps option * Add hallucination_silence_threshold option * Fix typing for python < 3.9 --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-12-18 12:11:16 -08:00
Bob Lin	8bc8860694	Fix triton env marker (#1887 )	2023-12-11 10:39:08 -05:00
Jong Wook Kim	e58f288045	Release 20231117 v20231117	2023-11-17 11:59:28 -08:00
Eugene Indenbom	1cea435768	Relax triton requirements for compatibility with pytorch 2.1 and newer (#1802 )	2023-11-13 09:43:42 -08:00
Jong Wook Kim	fcfeaf1b61	Release 20231106 v20231106	2023-11-06 10:14:04 -08:00
Jong Wook Kim	c5d4256076	large-v3 (#1761 ) * mel_filters() loads 128 mel bins * can load 100-language models * large-v3 checkpoint and evals * add mandarin alias * remove unused path * flake8 fix * formatting fix	2023-11-06 10:10:30 -08:00
Jong Wook Kim	f6f01c561c	Release 20231105 v20231105	2023-11-06 03:08:56 -08:00
Jong Wook Kim	746aaaeafa	remove tiktoken pin (#1759 )	2023-11-06 03:05:21 -08:00
Philippe Hebert	b9f17e1f2d	docs: Disambiguation of the term "relative speed" in the README (#1751 ) * docs: defines relative speed in README * combined paragraphs --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-11-06 02:43:07 -08:00
Mohamad Zamini	7dfcd56304	allow_pickle=False while loading of mel matrix IN audio.py (#1511 ) * Update audio.py The `mel_filters` function is using a `np.load` function to load a pre-computed mel filterbank matrix. This function is not thread-safe, which means that if it is called from multiple threads at the same time, it may corrupt the data. To fix this, you can use the `torch.load` function instead. This function is thread-safe, so it will not corrupt the data if it is called from multiple threads at the same time. * Update audio.py updated the docstring * allow_pickle=False * newline --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-11-06 02:28:51 -08:00
Marco Zucconelli	b7d277acd5	handling transcribe exceptions. (#1682 ) * handling transcribe() exceptions. * printing stacktrace --------- Co-authored-by: invalid <invalid@email.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu> Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-11-06 02:06:19 -08:00
amosal	6ed314fe41	Add new option to generate subtitles by a specific number of words (#1729 ) * ADD parser for new argument --max_words_count * ADD max_words_count in words_options ADD warning for max_line_width compatibility * ADD logic for max_words_count * rename to max_words_per_line * make them kwargs * allow specifying file path by --model * black formatting --------- Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-11-06 01:49:33 -08:00
Jordi Mas	b38a1f20f4	Fix exception when an audio file with no speech is provided (#1396 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-10-10 10:01:01 -07:00

1 2 3 4

173 Commits