feat: Add comprehensive configuration and documentation

- Create config.py with model, device, and format settings - Add model descriptions and performance information - Expand README with detailed installation instructions - Add troubleshooting section for common issues - Include advanced usage examples - Document all export formats and features - Add performance tips and recommendations - Phase 6 complete: Full configuration and documentation ready
2025-11-28 16:14:00 +00:00 · 2025-11-12 05:13:35 +00:00 · 2025-11-12 05:13:35 +00:00 · efdcf42ffd
commit efdcf42ffd
parent 72ab2e3fa9
2 changed files with 266 additions and 50 deletions
--- a/farsi_transcriber/README.md
+++ b/farsi_transcriber/README.md
@ -1,29 +1,48 @@
 # Farsi Transcriber
-A desktop application for transcribing Farsi audio and video files using OpenAI's Whisper model.
+A professional desktop application for transcribing Farsi audio and video files using OpenAI's Whisper model.
 ## Features
- 🎙️ Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, etc.)
+✨ **Core Features**
- 🎬 Extract audio from video files (MP4, MKV, MOV, WebM, AVI, etc.)
+- 🎙️ Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, AAC, WMA)
- 🇮🇷 High-accuracy Farsi transcription
+- 🎬 Extract audio from video files (MP4, MKV, MOV, WebM, AVI, FLV, WMV)
- ⏱️ Word-level timestamps
+- 🇮🇷 High-accuracy Farsi/Persian language transcription
- 📤 Export to multiple formats (TXT, SRT, JSON)
+- ⏱️ Word-level timestamps for precise timing
- 💻 Clean PyQt6-based GUI
+- 📤 Export to multiple formats (TXT, SRT, VTT, JSON, TSV)
 - 💻 Clean, intuitive PyQt6-based GUI
 - 🚀 GPU acceleration support (CUDA) with automatic fallback to CPU
 - 🔄 Progress indicators and real-time status updates
 ## System Requirements
- Python 3.8+
+**Minimum:**
- ffmpeg (for audio/video processing)
+- Python 3.8 or higher
- 8GB+ RAM recommended (for high-accuracy model)
+- 4GB RAM
 - ffmpeg installed
-### Install ffmpeg
+**Recommended:**
 - Python 3.10+
 - 8GB+ RAM
 - NVIDIA GPU with CUDA support (optional but faster)
 - SSD for better performance
 ## Installation
 ### Step 1: Install ffmpeg
 Choose your operating system:
 **Ubuntu/Debian:**
 ```bash
 sudo apt update && sudo apt install ffmpeg
 ```
 **Fedora/CentOS:**
 ```bash
 sudo dnf install ffmpeg
 ```
 **macOS (Homebrew):**
 ```bash
 brew install ffmpeg
@ -34,80 +53,205 @@ brew install ffmpeg
 choco install ffmpeg
 ```
-## Installation
+**Windows (Scoop):**
 1. Clone the repository
 2. Create a virtual environment:
 ```bash
 scoop install ffmpeg
 ```
 ### Step 2: Set up Python environment
 ```bash
 # Navigate to the repository
 cd whisper/farsi_transcriber
 # Create virtual environment
 python3 -m venv venv
 # Activate virtual environment
 source venv/bin/activate  # On Windows: venv\Scripts\activate
 ```
-3. Install dependencies:
+### Step 3: Install dependencies
 ```bash
 pip install -r requirements.txt
 ```
-4. Run the application:
+This will install:
-```bash
+- PyQt6 (GUI framework)
-python main.py
+- openai-whisper (transcription engine)
-```
+- PyTorch (deep learning framework)
 - NumPy, tiktoken, tqdm (supporting libraries)
 ## Usage
-### GUI Application
+### Running the Application
 ```bash
 python main.py
 ```
-Then:
+### Step-by-Step Guide
 1. Click "Select File" to choose an audio or video file
 2. Click "Transcribe" and wait for processing
 3. View results with timestamps
 4. Export to your preferred format
-### Command Line (Coming Soon)
+1. **Launch the app** - Run `python main.py`
-```bash
+2. **Select a file** - Click "Select File" button to choose audio/video
-python -m farsi_transcriber --input audio.mp3 --output transcription.srt
+3. **Transcribe** - Click "Transcribe" and wait for completion
 4. **View results** - See transcription with timestamps
 5. **Export** - Click "Export Results" to save in your preferred format
 ### Supported Export Formats
 - **TXT** - Plain text (content only)
 - **SRT** - SubRip subtitle format (with timestamps)
 - **VTT** - WebVTT subtitle format (with timestamps)
 - **JSON** - Structured format with segments and metadata
 - **TSV** - Tab-separated values (spreadsheet compatible)
 ## Configuration
 Edit `config.py` to customize:
 ```python
 # Model size (tiny, base, small, medium, large)
 DEFAULT_MODEL = "medium"
 # Language code
 LANGUAGE_CODE = "fa"  # Farsi
 # Supported formats
 SUPPORTED_AUDIO_FORMATS = {".mp3", ".wav", ".m4a", ".flac", ".ogg", ...}
 SUPPORTED_VIDEO_FORMATS = {".mp4", ".mkv", ".mov", ".webm", ".avi", ...}
 ```
 ## Model Information
-This application uses OpenAI's Whisper model optimized for Farsi:
+### Available Models
- **Model**: medium or large (configurable)
+
- **Accuracy**: Optimized for Persian language
+| Model | Size | Speed | Accuracy | VRAM |
- **Processing**: Local processing (no cloud required)
+|-------|------|-------|----------|------|
 | tiny | 39M | ~10x | Good | ~1GB |
 | base | 74M | ~7x | Very Good | ~1GB |
 | small | 244M | ~4x | Excellent | ~2GB |
 | medium | 769M | ~2x | Excellent | ~5GB |
 | large | 1550M | 1x | Best | ~10GB |
 **Default**: `medium` (recommended for Farsi)
 ### Performance Notes
 - Larger models provide better accuracy but require more VRAM
 - GPU (CUDA) dramatically speeds up transcription (8-10x faster)
 - First run downloads the model (~500MB-3GB depending on model size)
 - Subsequent runs use cached model files
 ## Project Structure
 ```
 farsi_transcriber/
-├── ui/               # PyQt6 UI components
+├── ui/                          # User interface components
-├── models/           # Whisper model management
+│   ├── __init__.py
 │   ├── main_window.py          # Main application window
 │   └── styles.py               # Styling and theming
 ├── models/                      # Model management
 │   ├── __init__.py
 │   └── whisper_transcriber.py  # Whisper wrapper
 ├── utils/                       # Utility functions
 │   ├── __init__.py
 │   └── export.py               # Export functionality
 ├── config.py                    # Configuration settings
 ├── main.py                      # Application entry point
 ├── __init__.py                  # Package init
 ├── requirements.txt             # Python dependencies
 └── README.md                    # This file
 ```
 ## Troubleshooting
 ### Issue: "ffmpeg not found"
 **Solution**: Install ffmpeg using your package manager (see Installation section)
 ### Issue: "CUDA out of memory"
 **Solution**: Use a smaller model or reduce audio processing in chunks
 ### Issue: "Model download fails"
 **Solution**: Check internet connection, try again. Models are cached in `~/.cache/whisper/`
 ### Issue: Slow transcription
 **Solution**: Ensure CUDA is detected (`nvidia-smi`), or upgrade to a smaller/faster model
 ## Advanced Usage
 ### Custom Model Selection
 Update `config.py`:
 ```python
 DEFAULT_MODEL = "large"  # For maximum accuracy
 # or
 DEFAULT_MODEL = "tiny"   # For fastest processing
 ```
 ### Batch Processing (Future)
 Script to process multiple files:
 ```python
 from farsi_transcriber.models.whisper_transcriber import FarsiTranscriber
 transcriber = FarsiTranscriber(model_name="medium")
 for audio_file in audio_files:
    result = transcriber.transcribe(audio_file)
    # Process results
 ```
 ## Performance Tips
 1. **Use GPU** - Ensure NVIDIA CUDA is properly installed
 2. **Choose appropriate model** - Balance speed vs accuracy
 3. **Close other applications** - Free up RAM/VRAM
 4. **Use SSD** - Faster model loading and temporary file I/O
 5. **Local processing** - All processing happens locally, no cloud uploads
 ## Development
-### Running Tests
+### Code Style
 ```bash
-pytest tests/
+# Format code
 black farsi_transcriber/
 # Check style
 flake8 farsi_transcriber/
 # Sort imports
 isort farsi_transcriber/
 ```
-### Code Style
+### Future Features
-```bash
+
-black .
+- [ ] Batch processing
-flake8 .
+- [ ] Real-time transcription preview
-isort .
+- [ ] Speaker diarization
-```
+- [ ] Multi-language support UI
 - [ ] Settings dialog
 - [ ] Keyboard shortcuts
 - [ ] Drag-and-drop support
 - [ ] Recent files history
 ## License
-MIT License - See LICENSE file for details
+MIT License - Personal use and modifications allowed
-## Contributing
+## Acknowledgments
-This is a personal project, but feel free to fork and modify for your needs!
+Built with:
 - [OpenAI Whisper](https://github.com/openai/whisper) - Speech recognition
 - [PyQt6](https://www.riverbankcomputing.com/software/pyqt/) - GUI framework
 - [PyTorch](https://pytorch.org/) - Deep learning
 ## Support
 For issues or suggestions:
 1. Check the troubleshooting section
 2. Verify ffmpeg is installed
 3. Ensure Python 3.8+ is used
 4. Check available disk space
 5. Verify CUDA setup (for GPU users)
--- a/farsi_transcriber/config.py
+++ b/farsi_transcriber/config.py
@ -0,0 +1,72 @@
 """
 Configuration settings for Farsi Transcriber application
 Manages model selection, device settings, and other configuration options.
 """
 import os
 from pathlib import Path
 # Application metadata
 APP_NAME = "Farsi Transcriber"
 APP_VERSION = "0.1.0"
 APP_DESCRIPTION = "A desktop application for transcribing Farsi audio and video files"
 # Model settings
 DEFAULT_MODEL = "medium"  # Options: tiny, base, small, medium, large
 AVAILABLE_MODELS = ["tiny", "base", "small", "medium", "large"]
 MODEL_DESCRIPTIONS = {
    "tiny": "Smallest model (39M params) - Fastest, ~1GB VRAM required",
    "base": "Small model (74M params) - Fast, ~1GB VRAM required",
    "small": "Medium model (244M params) - Balanced, ~2GB VRAM required",
    "medium": "Large model (769M params) - Good accuracy, ~5GB VRAM required",
    "large": "Largest model (1550M params) - Best accuracy, ~10GB VRAM required",
 }
 # Language settings
 LANGUAGE_CODE = "fa"  # Farsi/Persian
 LANGUAGE_NAME = "Farsi"
 # Audio/Video settings
 SUPPORTED_AUDIO_FORMATS = {".mp3", ".wav", ".m4a", ".flac", ".ogg", ".aac", ".wma"}
 SUPPORTED_VIDEO_FORMATS = {".mp4", ".mkv", ".mov", ".webm", ".avi", ".flv", ".wmv"}
 # UI settings
 WINDOW_WIDTH = 900
 WINDOW_HEIGHT = 700
 WINDOW_MIN_WIDTH = 800
 WINDOW_MIN_HEIGHT = 600
 # Output settings
 OUTPUT_DIR = Path.home() / "FarsiTranscriber" / "outputs"
 OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
 EXPORT_FORMATS = {
    "txt": "Plain Text",
    "srt": "SRT Subtitles",
    "vtt": "WebVTT Subtitles",
    "json": "JSON Format",
    "tsv": "Tab-Separated Values",
 }
 # Device settings (auto-detect CUDA if available)
 try:
    import torch
    DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
 except ImportError:
    DEVICE = "cpu"
 # Logging settings
 LOG_LEVEL = "INFO"
 LOG_FILE = OUTPUT_DIR / "transcriber.log"
 def get_model_info(model_name: str) -> str:
    """Get description for a model"""
    return MODEL_DESCRIPTIONS.get(model_name, "Unknown model")
 def get_supported_formats() -> set:
    """Get all supported audio and video formats"""
    return SUPPORTED_AUDIO_FORMATS | SUPPORTED_VIDEO_FORMATS