mirror of https://github.com/openai/whisper.git synced 2025-11-24 06:26:03 +00:00

History

feat: Create PyQt6 GUI with file picker and results display

- Implement MainWindow class with professional layout
- Add file picker for audio and video formats
- Create transcription button with threading support
- Add progress bar and status indicators
- Implement TranscriptionWorker thread to prevent UI freezing
- Add results display with timestamps support
- Create export button (placeholder for Phase 4)
- Add error handling and user feedback
- Phase 2 complete: Full GUI scaffolding ready

2025-11-12 05:10:53 +00:00

models

feat: Initialize Farsi Transcriber application structure

2025-11-12 05:09:15 +00:00

feat: Create PyQt6 GUI with file picker and results display

2025-11-12 05:10:53 +00:00

utils

feat: Initialize Farsi Transcriber application structure

2025-11-12 05:09:15 +00:00

__init__.py

feat: Initialize Farsi Transcriber application structure

2025-11-12 05:09:15 +00:00

.gitignore

feat: Initialize Farsi Transcriber application structure

2025-11-12 05:09:15 +00:00

main.py

feat: Create PyQt6 GUI with file picker and results display

2025-11-12 05:10:53 +00:00

README.md

feat: Initialize Farsi Transcriber application structure

2025-11-12 05:09:15 +00:00

requirements.txt

feat: Initialize Farsi Transcriber application structure

2025-11-12 05:09:15 +00:00

README.md

Farsi Transcriber

A desktop application for transcribing Farsi audio and video files using OpenAI's Whisper model.

Features

🎙️ Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, etc.)
🎬 Extract audio from video files (MP4, MKV, MOV, WebM, AVI, etc.)
🇮🇷 High-accuracy Farsi transcription
⏱️ Word-level timestamps
📤 Export to multiple formats (TXT, SRT, JSON)
💻 Clean PyQt6-based GUI

System Requirements

Python 3.8+
ffmpeg (for audio/video processing)
8GB+ RAM recommended (for high-accuracy model)

Install ffmpeg

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

macOS (Homebrew):

brew install ffmpeg

Windows (Chocolatey):

choco install ffmpeg

Installation

Clone the repository
Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Run the application:

python main.py

Usage

GUI Application

python main.py

Then:

Click "Select File" to choose an audio or video file
Click "Transcribe" and wait for processing
View results with timestamps
Export to your preferred format

Command Line (Coming Soon)

python -m farsi_transcriber --input audio.mp3 --output transcription.srt

Model Information

This application uses OpenAI's Whisper model optimized for Farsi:

Model: medium or large (configurable)
Accuracy: Optimized for Persian language
Processing: Local processing (no cloud required)

Project Structure

farsi_transcriber/
├── ui/               # PyQt6 UI components
├── models/           # Whisper model management
├── utils/            # Utility functions
├── main.py           # Application entry point
├── requirements.txt  # Python dependencies
└── README.md         # This file

Development

Running Tests

pytest tests/

Code Style

black .
flake8 .
isort .

License

MIT License - See LICENSE file for details

Contributing

This is a personal project, but feel free to fork and modify for your needs!