# Farsi Transcriber - Web Application A professional web-based application for transcribing Farsi audio and video files using OpenAI's Whisper model. ## Features ✨ **Core Features** - 🎙️ Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, AAC, WMA) - 🎬 Extract audio from video files (MP4, MKV, MOV, WebM, AVI, FLV, WMV) - 🇮🇷 High-accuracy Farsi/Persian language transcription - ⏱️ Word-level timestamps for precise timing - 📤 Export to multiple formats (TXT, SRT, VTT, JSON) - 💻 Clean, intuitive React-based UI with Figma design - 🎨 Dark/Light theme toggle - 🔍 Search and text highlighting in transcriptions - 📋 File queue management - 💾 Copy individual transcription segments - 🚀 GPU acceleration support (CUDA) - 🎯 Resizable window for flexible workspace ## Tech Stack **Frontend:** - React 18+ with TypeScript - Vite (fast build tool) - Tailwind CSS v4.0 - Lucide React (icons) - re-resizable (window resizing) - Sonner (toast notifications) **Backend:** - Flask (Python web framework) - OpenAI Whisper (speech recognition) - PyTorch (deep learning) - Flask-CORS (cross-origin requests) ## System Requirements **Frontend:** - Node.js 16+ - npm/yarn/pnpm **Backend:** - Python 3.8+ - 4GB RAM minimum - 8GB+ recommended - ffmpeg installed - Optional: NVIDIA GPU with CUDA support ## Installation ### Step 1: Install ffmpeg Choose your operating system: **Ubuntu/Debian:** ```bash sudo apt update && sudo apt install ffmpeg ``` **macOS (Homebrew):** ```bash brew install ffmpeg ``` **Windows (Chocolatey):** ```bash choco install ffmpeg ``` ### Step 2: Backend Setup ```bash # Navigate to backend directory cd backend # Create virtual environment python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt ``` ### Step 3: Frontend Setup ```bash # Navigate to root directory cd .. # Install Node dependencies npm install # Or use yarn/pnpm yarn install # or pnpm install ``` ## Running the Application ### Step 1: Start Backend API ```bash cd backend source venv/bin/activate # Activate virtual environment python app.py ``` The API will be available at `http://localhost:5000` ### Step 2: Start Frontend Dev Server In a new terminal: ```bash npm run dev ``` The application will be available at `http://localhost:3000` ## Building for Production ### Frontend Build ```bash npm run build ``` This creates optimized production build in `dist/` directory. ### Backend Deployment For production, use a production WSGI server: ```bash # Install Gunicorn pip install gunicorn # Run with Gunicorn gunicorn -w 4 -b 0.0.0.0:5000 app:app ``` ## API Endpoints ### `/health` (GET) Health check endpoint **Response:** ```json { "status": "healthy", "model_loaded": true, "device": "cuda|cpu" } ``` ### `/transcribe` (POST) Transcribe audio/video file **Request:** - `file`: Audio/video file (multipart/form-data) - `language`: Language code (optional, default: "fa" for Farsi) **Response:** ```json { "status": "success", "filename": "audio.mp3", "language": "fa", "text": "Full transcription text...", "segments": [ { "start": "00:00:00.000", "end": "00:00:05.500", "text": "سلام دنیا" } ] } ``` ### `/models` (GET) Get available Whisper models **Response:** ```json { "available_models": ["tiny", "base", "small", "medium", "large"], "current_model": "medium", "description": "..." } ``` ### `/export` (POST) Export transcription **Request:** ```json { "transcription": "Full text...", "segments": [...], "format": "txt|srt|vtt|json" } ``` **Response:** ```json { "status": "success", "format": "srt", "content": "...", "mime_type": "text/plain" } ``` ## Usage Guide ### 1. Add Files to Queue - Click "Add Files" button in the left sidebar - Select audio or video files - Multiple files can be added to the queue ### 2. Transcribe - Select a file from the queue - Click "Transcribe" button - Watch the progress indicator - Results appear with timestamps ### 3. Search & Copy - Use the search bar to find specific text - Matching text is highlighted - Click copy icon to copy individual segments ### 4. Export Results - Select export format (TXT, SRT, VTT, JSON) - Click "Export" button - File is downloaded or ready to save ### 5. Theme Toggle - Click sun/moon icon in header - Switch between light and dark themes ## Project Structure ``` farsi_transcriber_web/ ├── src/ │ ├── App.tsx # Main application component │ ├── main.tsx # React entry point │ ├── index.css # Global styles │ └── components/ │ ├── Button.tsx │ ├── Progress.tsx │ ├── Input.tsx │ └── Select.tsx ├── backend/ │ ├── app.py # Flask API server │ ├── requirements.txt # Python dependencies │ └── .gitignore ├── public/ ├── package.json ├── vite.config.ts ├── tsconfig.json ├── tailwind.config.js ├── postcss.config.js └── README.md ``` ## Configuration ### Environment Variables Create a `.env.local` file in the root directory: ``` VITE_API_URL=http://localhost:5000 VITE_MAX_FILE_SIZE=500MB ``` ### Backend Configuration Edit `backend/app.py` to customize: ```python # Change model size model = whisper.load_model('large') # tiny, base, small, medium, large # Change upload folder UPLOAD_FOLDER = '/custom/path' # Change max file size MAX_FILE_SIZE = 1024 * 1024 * 1024 # 1GB ``` ## Troubleshooting ### Issue: "API connection failed" **Solution**: Ensure backend is running on `http://localhost:5000` ### Issue: "Whisper model not found" **Solution**: First run downloads the model (~3GB). Ensure internet connection and disk space. ### Issue: "CUDA out of memory" **Solution**: Use smaller model or reduce batch size in `backend/app.py` ### Issue: "ffmpeg not found" **Solution**: Install ffmpeg using your package manager (see Installation section) ### Issue: Port 3000 or 5000 already in use **Solution**: Change ports in `vite.config.ts` and `backend/app.py` ## Performance Tips 1. **Use GPU** - Ensure NVIDIA CUDA is properly installed 2. **Choose appropriate model** - Balance speed vs accuracy 3. **Close other applications** - Free up RAM/VRAM 4. **Use SSD** - Faster model loading and file I/O 5. **Batch Processing** - Process multiple files sequentially ## Future Enhancements - [ ] Drag-and-drop file upload - [ ] Audio playback synchronized with transcription - [ ] Edit segments inline - [ ] Keyboard shortcuts - [ ] Save/load sessions - [ ] Speaker diarization - [ ] Confidence scores - [ ] Custom vocabulary support ## Development ### Code Style ```bash # Format code (if ESLint configured) npm run lint # Build for development npm run dev # Build for production npm run build ``` ### Adding Components New components go in `src/components/` and should: - Use TypeScript - Include prop interfaces - Export as default - Include JSDoc comments ## Common Issues & Solutions | Issue | Solution | |-------|----------| | Models slow to load | GPU required for fast transcription | | File not supported | Check file extension is in supported list | | Transcription has errors | Try larger model (medium/large) | | Application crashes | Check browser console and Flask logs | | Export not working | Ensure segments data is complete | ## License MIT License - Personal use and modifications allowed ## Credits Built with: - [OpenAI Whisper](https://github.com/openai/whisper) - Speech recognition - [React](https://react.dev/) - UI framework - [Vite](https://vitejs.dev/) - Build tool - [Tailwind CSS](https://tailwindcss.com/) - Styling - [Flask](https://flask.palletsprojects.com/) - Backend framework ## Support For issues: 1. Check the troubleshooting section 2. Verify ffmpeg is installed 3. Check Flask backend logs 4. Review browser console for errors 5. Ensure Python 3.8+ and Node.js 16+ are installed