mirror of
https://github.com/openai/whisper.git
synced 2025-11-23 22:15:58 +00:00
feat: Initialize Farsi Transcriber application structure
- Create project directories (ui, models, utils) - Add PyQt6 environment setup with requirements.txt - Create main entry point (main.py) - Add comprehensive README with setup instructions - Add .gitignore for Python, PyTorch, and ML artifacts - Phase 1 complete: project structure and environment ready
This commit is contained in:
parent
c0d2f624c0
commit
86b2a93dee
52
farsi_transcriber/.gitignore
vendored
Normal file
52
farsi_transcriber/.gitignore
vendored
Normal file
@ -0,0 +1,52 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# Virtual Environment
|
||||
venv/
|
||||
ENV/
|
||||
env/
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
.DS_Store
|
||||
|
||||
# PyTorch/ML Models
|
||||
*.pt
|
||||
*.pth
|
||||
models/downloaded/
|
||||
|
||||
# Whisper models cache
|
||||
~/.cache/whisper/
|
||||
|
||||
# Application outputs
|
||||
transcriptions/
|
||||
exports/
|
||||
*.log
|
||||
|
||||
# Testing
|
||||
.pytest_cache/
|
||||
.coverage
|
||||
htmlcov/
|
||||
113
farsi_transcriber/README.md
Normal file
113
farsi_transcriber/README.md
Normal file
@ -0,0 +1,113 @@
|
||||
# Farsi Transcriber
|
||||
|
||||
A desktop application for transcribing Farsi audio and video files using OpenAI's Whisper model.
|
||||
|
||||
## Features
|
||||
|
||||
- 🎙️ Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, etc.)
|
||||
- 🎬 Extract audio from video files (MP4, MKV, MOV, WebM, AVI, etc.)
|
||||
- 🇮🇷 High-accuracy Farsi transcription
|
||||
- ⏱️ Word-level timestamps
|
||||
- 📤 Export to multiple formats (TXT, SRT, JSON)
|
||||
- 💻 Clean PyQt6-based GUI
|
||||
|
||||
## System Requirements
|
||||
|
||||
- Python 3.8+
|
||||
- ffmpeg (for audio/video processing)
|
||||
- 8GB+ RAM recommended (for high-accuracy model)
|
||||
|
||||
### Install ffmpeg
|
||||
|
||||
**Ubuntu/Debian:**
|
||||
```bash
|
||||
sudo apt update && sudo apt install ffmpeg
|
||||
```
|
||||
|
||||
**macOS (Homebrew):**
|
||||
```bash
|
||||
brew install ffmpeg
|
||||
```
|
||||
|
||||
**Windows (Chocolatey):**
|
||||
```bash
|
||||
choco install ffmpeg
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
1. Clone the repository
|
||||
2. Create a virtual environment:
|
||||
```bash
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
```
|
||||
|
||||
3. Install dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
4. Run the application:
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### GUI Application
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
Then:
|
||||
1. Click "Select File" to choose an audio or video file
|
||||
2. Click "Transcribe" and wait for processing
|
||||
3. View results with timestamps
|
||||
4. Export to your preferred format
|
||||
|
||||
### Command Line (Coming Soon)
|
||||
```bash
|
||||
python -m farsi_transcriber --input audio.mp3 --output transcription.srt
|
||||
```
|
||||
|
||||
## Model Information
|
||||
|
||||
This application uses OpenAI's Whisper model optimized for Farsi:
|
||||
- **Model**: medium or large (configurable)
|
||||
- **Accuracy**: Optimized for Persian language
|
||||
- **Processing**: Local processing (no cloud required)
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
farsi_transcriber/
|
||||
├── ui/ # PyQt6 UI components
|
||||
├── models/ # Whisper model management
|
||||
├── utils/ # Utility functions
|
||||
├── main.py # Application entry point
|
||||
├── requirements.txt # Python dependencies
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Running Tests
|
||||
```bash
|
||||
pytest tests/
|
||||
```
|
||||
|
||||
### Code Style
|
||||
```bash
|
||||
black .
|
||||
flake8 .
|
||||
isort .
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT License - See LICENSE file for details
|
||||
|
||||
## Contributing
|
||||
|
||||
This is a personal project, but feel free to fork and modify for your needs!
|
||||
8
farsi_transcriber/__init__.py
Normal file
8
farsi_transcriber/__init__.py
Normal file
@ -0,0 +1,8 @@
|
||||
"""
|
||||
Farsi Transcriber Application
|
||||
|
||||
A desktop application for transcribing Farsi audio and video files using OpenAI's Whisper.
|
||||
"""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
__author__ = "Personal Project"
|
||||
28
farsi_transcriber/main.py
Normal file
28
farsi_transcriber/main.py
Normal file
@ -0,0 +1,28 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Farsi Transcriber - Main Entry Point
|
||||
|
||||
A PyQt6-based desktop application for transcribing Farsi audio and video files.
|
||||
"""
|
||||
|
||||
import sys
|
||||
from PyQt6.QtWidgets import QApplication
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point for the application"""
|
||||
app = QApplication(sys.argv)
|
||||
|
||||
# TODO: Import and create main window
|
||||
# from ui.main_window import MainWindow
|
||||
# window = MainWindow()
|
||||
# window.show()
|
||||
|
||||
print("Farsi Transcriber App initialized (setup phase)")
|
||||
print("✓ PyQt6 environment ready")
|
||||
|
||||
sys.exit(app.exec())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
farsi_transcriber/models/__init__.py
Normal file
1
farsi_transcriber/models/__init__.py
Normal file
@ -0,0 +1 @@
|
||||
"""Model management for Farsi Transcriber"""
|
||||
7
farsi_transcriber/requirements.txt
Normal file
7
farsi_transcriber/requirements.txt
Normal file
@ -0,0 +1,7 @@
|
||||
PyQt6==6.6.1
|
||||
PyQt6-Qt6==6.6.1
|
||||
PyQt6-sip==13.6.0
|
||||
torch>=1.10.1
|
||||
numpy
|
||||
openai-whisper
|
||||
tqdm
|
||||
1
farsi_transcriber/ui/__init__.py
Normal file
1
farsi_transcriber/ui/__init__.py
Normal file
@ -0,0 +1 @@
|
||||
"""UI components for Farsi Transcriber"""
|
||||
1
farsi_transcriber/utils/__init__.py
Normal file
1
farsi_transcriber/utils/__init__.py
Normal file
@ -0,0 +1 @@
|
||||
"""Utility functions for Farsi Transcriber"""
|
||||
Loading…
x
Reference in New Issue
Block a user