Backend Updates: - Add lazy loading for Whisper model (faster startup) - Use environment variables for port and config - Add root endpoint for health checking - Configure CORS for production - Add tempfile support for uploads - Update to support gunicorn production server - Add Procfile for Heroku/Railway compatibility Frontend Updates: - Optimize Vite build configuration - Add production build optimizations - Enable minification and code splitting - Configure preview server for production Configuration: - Add .env.example files for both frontend and backend - Create railway.toml for Railway deployment - Add Procfile for process management - Setup environment variable templates Documentation: - Create comprehensive RAILWAY_DEPLOYMENT.md guide - Include step-by-step deployment instructions - Add troubleshooting section - Include cost breakdown - Add monitoring and maintenance guide Dependencies: - Add gunicorn for production WSGI server Ready for Railway deployment with: - Free $5/month credit - Automatic scaling - 24/7 uptime - Custom domain support (optional)
Farsi Transcriber - Web Application
A professional web-based application for transcribing Farsi audio and video files using OpenAI's Whisper model.
Features
✨ Core Features
- 🎙️ Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, AAC, WMA)
- 🎬 Extract audio from video files (MP4, MKV, MOV, WebM, AVI, FLV, WMV)
- 🇮🇷 High-accuracy Farsi/Persian language transcription
- ⏱️ Word-level timestamps for precise timing
- 📤 Export to multiple formats (TXT, SRT, VTT, JSON)
- 💻 Clean, intuitive React-based UI with Figma design
- 🎨 Dark/Light theme toggle
- 🔍 Search and text highlighting in transcriptions
- 📋 File queue management
- 💾 Copy individual transcription segments
- 🚀 GPU acceleration support (CUDA)
- 🎯 Resizable window for flexible workspace
Tech Stack
Frontend:
- React 18+ with TypeScript
- Vite (fast build tool)
- Tailwind CSS v4.0
- Lucide React (icons)
- re-resizable (window resizing)
- Sonner (toast notifications)
Backend:
- Flask (Python web framework)
- OpenAI Whisper (speech recognition)
- PyTorch (deep learning)
- Flask-CORS (cross-origin requests)
System Requirements
Frontend:
- Node.js 16+
- npm/yarn/pnpm
Backend:
- Python 3.8+
- 4GB RAM minimum
- 8GB+ recommended
- ffmpeg installed
- Optional: NVIDIA GPU with CUDA support
Installation
Step 1: Install ffmpeg
Choose your operating system:
Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
macOS (Homebrew):
brew install ffmpeg
Windows (Chocolatey):
choco install ffmpeg
Step 2: Backend Setup
# Navigate to backend directory
cd backend
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
Step 3: Frontend Setup
# Navigate to root directory
cd ..
# Install Node dependencies
npm install
# Or use yarn/pnpm
yarn install
# or
pnpm install
Running the Application
Step 1: Start Backend API
cd backend
source venv/bin/activate # Activate virtual environment
python app.py
The API will be available at http://localhost:5000
Step 2: Start Frontend Dev Server
In a new terminal:
npm run dev
The application will be available at http://localhost:3000
Building for Production
Frontend Build
npm run build
This creates optimized production build in dist/ directory.
Backend Deployment
For production, use a production WSGI server:
# Install Gunicorn
pip install gunicorn
# Run with Gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app
API Endpoints
/health (GET)
Health check endpoint
Response:
{
"status": "healthy",
"model_loaded": true,
"device": "cuda|cpu"
}
/transcribe (POST)
Transcribe audio/video file
Request:
file: Audio/video file (multipart/form-data)language: Language code (optional, default: "fa" for Farsi)
Response:
{
"status": "success",
"filename": "audio.mp3",
"language": "fa",
"text": "Full transcription text...",
"segments": [
{
"start": "00:00:00.000",
"end": "00:00:05.500",
"text": "سلام دنیا"
}
]
}
/models (GET)
Get available Whisper models
Response:
{
"available_models": ["tiny", "base", "small", "medium", "large"],
"current_model": "medium",
"description": "..."
}
/export (POST)
Export transcription
Request:
{
"transcription": "Full text...",
"segments": [...],
"format": "txt|srt|vtt|json"
}
Response:
{
"status": "success",
"format": "srt",
"content": "...",
"mime_type": "text/plain"
}
Usage Guide
1. Add Files to Queue
- Click "Add Files" button in the left sidebar
- Select audio or video files
- Multiple files can be added to the queue
2. Transcribe
- Select a file from the queue
- Click "Transcribe" button
- Watch the progress indicator
- Results appear with timestamps
3. Search & Copy
- Use the search bar to find specific text
- Matching text is highlighted
- Click copy icon to copy individual segments
4. Export Results
- Select export format (TXT, SRT, VTT, JSON)
- Click "Export" button
- File is downloaded or ready to save
5. Theme Toggle
- Click sun/moon icon in header
- Switch between light and dark themes
Project Structure
farsi_transcriber_web/
├── src/
│ ├── App.tsx # Main application component
│ ├── main.tsx # React entry point
│ ├── index.css # Global styles
│ └── components/
│ ├── Button.tsx
│ ├── Progress.tsx
│ ├── Input.tsx
│ └── Select.tsx
├── backend/
│ ├── app.py # Flask API server
│ ├── requirements.txt # Python dependencies
│ └── .gitignore
├── public/
├── package.json
├── vite.config.ts
├── tsconfig.json
├── tailwind.config.js
├── postcss.config.js
└── README.md
Configuration
Environment Variables
Create a .env.local file in the root directory:
VITE_API_URL=http://localhost:5000
VITE_MAX_FILE_SIZE=500MB
Backend Configuration
Edit backend/app.py to customize:
# Change model size
model = whisper.load_model('large') # tiny, base, small, medium, large
# Change upload folder
UPLOAD_FOLDER = '/custom/path'
# Change max file size
MAX_FILE_SIZE = 1024 * 1024 * 1024 # 1GB
Troubleshooting
Issue: "API connection failed"
Solution: Ensure backend is running on http://localhost:5000
Issue: "Whisper model not found"
Solution: First run downloads the model (~3GB). Ensure internet connection and disk space.
Issue: "CUDA out of memory"
Solution: Use smaller model or reduce batch size in backend/app.py
Issue: "ffmpeg not found"
Solution: Install ffmpeg using your package manager (see Installation section)
Issue: Port 3000 or 5000 already in use
Solution: Change ports in vite.config.ts and backend/app.py
Performance Tips
- Use GPU - Ensure NVIDIA CUDA is properly installed
- Choose appropriate model - Balance speed vs accuracy
- Close other applications - Free up RAM/VRAM
- Use SSD - Faster model loading and file I/O
- Batch Processing - Process multiple files sequentially
Future Enhancements
- Drag-and-drop file upload
- Audio playback synchronized with transcription
- Edit segments inline
- Keyboard shortcuts
- Save/load sessions
- Speaker diarization
- Confidence scores
- Custom vocabulary support
Development
Code Style
# Format code (if ESLint configured)
npm run lint
# Build for development
npm run dev
# Build for production
npm run build
Adding Components
New components go in src/components/ and should:
- Use TypeScript
- Include prop interfaces
- Export as default
- Include JSDoc comments
Common Issues & Solutions
| Issue | Solution |
|---|---|
| Models slow to load | GPU required for fast transcription |
| File not supported | Check file extension is in supported list |
| Transcription has errors | Try larger model (medium/large) |
| Application crashes | Check browser console and Flask logs |
| Export not working | Ensure segments data is complete |
License
MIT License - Personal use and modifications allowed
Credits
Built with:
- OpenAI Whisper - Speech recognition
- React - UI framework
- Vite - Build tool
- Tailwind CSS - Styling
- Flask - Backend framework
Support
For issues:
- Check the troubleshooting section
- Verify ffmpeg is installed
- Check Flask backend logs
- Review browser console for errors
- Ensure Python 3.8+ and Node.js 16+ are installed