whisper/farsi_transcriber_web
copilot-swe-agent[bot] 538917daa8 Update openai-whisper to compatible version for deployment
Co-authored-by: ariavn-byte <151469489+ariavn-byte@users.noreply.github.com>
2025-11-15 23:20:13 +00:00
..
2025-11-15 23:11:28 +00:00
2025-11-15 23:11:28 +00:00

Farsi Transcriber - Web Application

A professional web-based application for transcribing Farsi audio and video files using OpenAI's Whisper model.

Features

Core Features

  • 🎙️ Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, AAC, WMA)
  • 🎬 Extract audio from video files (MP4, MKV, MOV, WebM, AVI, FLV, WMV)
  • 🇮🇷 High-accuracy Farsi/Persian language transcription
  • ⏱️ Word-level timestamps for precise timing
  • 📤 Export to multiple formats (TXT, SRT, VTT, JSON)
  • 💻 Clean, intuitive React-based UI with Figma design
  • 🎨 Dark/Light theme toggle
  • 🔍 Search and text highlighting in transcriptions
  • 📋 File queue management
  • 💾 Copy individual transcription segments
  • 🚀 GPU acceleration support (CUDA)
  • 🎯 Resizable window for flexible workspace

Tech Stack

Frontend:

  • React 18+ with TypeScript
  • Vite (fast build tool)
  • Tailwind CSS v4.0
  • Lucide React (icons)
  • re-resizable (window resizing)
  • Sonner (toast notifications)

Backend:

  • Flask (Python web framework)
  • OpenAI Whisper (speech recognition)
  • PyTorch (deep learning)
  • Flask-CORS (cross-origin requests)

System Requirements

Frontend:

  • Node.js 16+
  • npm/yarn/pnpm

Backend:

  • Python 3.8+
  • 4GB RAM minimum
  • 8GB+ recommended
  • ffmpeg installed
  • Optional: NVIDIA GPU with CUDA support

Installation

Step 1: Install ffmpeg

Choose your operating system:

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

macOS (Homebrew):

brew install ffmpeg

Windows (Chocolatey):

choco install ffmpeg

Step 2: Backend Setup

# Navigate to backend directory
cd backend

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 3: Frontend Setup

# Navigate to root directory
cd ..

# Install Node dependencies
npm install

# Or use yarn/pnpm
yarn install
# or
pnpm install

Running the Application

Step 1: Start Backend API

cd backend
source venv/bin/activate  # Activate virtual environment
python app.py

The API will be available at http://localhost:5000

Step 2: Start Frontend Dev Server

In a new terminal:

npm run dev

The application will be available at http://localhost:3000

Building for Production

Frontend Build

npm run build

This creates optimized production build in dist/ directory.

Backend Deployment

For production, use a production WSGI server:

# Install Gunicorn
pip install gunicorn

# Run with Gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app

API Endpoints

/health (GET)

Health check endpoint

Response:

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cuda|cpu"
}

/transcribe (POST)

Transcribe audio/video file

Request:

  • file: Audio/video file (multipart/form-data)
  • language: Language code (optional, default: "fa" for Farsi)

Response:

{
  "status": "success",
  "filename": "audio.mp3",
  "language": "fa",
  "text": "Full transcription text...",
  "segments": [
    {
      "start": "00:00:00.000",
      "end": "00:00:05.500",
      "text": "سلام دنیا"
    }
  ]
}

/models (GET)

Get available Whisper models

Response:

{
  "available_models": ["tiny", "base", "small", "medium", "large"],
  "current_model": "medium",
  "description": "..."
}

/export (POST)

Export transcription

Request:

{
  "transcription": "Full text...",
  "segments": [...],
  "format": "txt|srt|vtt|json"
}

Response:

{
  "status": "success",
  "format": "srt",
  "content": "...",
  "mime_type": "text/plain"
}

Usage Guide

1. Add Files to Queue

  • Click "Add Files" button in the left sidebar
  • Select audio or video files
  • Multiple files can be added to the queue

2. Transcribe

  • Select a file from the queue
  • Click "Transcribe" button
  • Watch the progress indicator
  • Results appear with timestamps

3. Search & Copy

  • Use the search bar to find specific text
  • Matching text is highlighted
  • Click copy icon to copy individual segments

4. Export Results

  • Select export format (TXT, SRT, VTT, JSON)
  • Click "Export" button
  • File is downloaded or ready to save

5. Theme Toggle

  • Click sun/moon icon in header
  • Switch between light and dark themes

Project Structure

farsi_transcriber_web/
├── src/
│   ├── App.tsx              # Main application component
│   ├── main.tsx             # React entry point
│   ├── index.css            # Global styles
│   └── components/
│       ├── Button.tsx
│       ├── Progress.tsx
│       ├── Input.tsx
│       └── Select.tsx
├── backend/
│   ├── app.py               # Flask API server
│   ├── requirements.txt      # Python dependencies
│   └── .gitignore
├── public/
├── package.json
├── vite.config.ts
├── tsconfig.json
├── tailwind.config.js
├── postcss.config.js
└── README.md

Configuration

Environment Variables

Create a .env.local file in the root directory:

VITE_API_URL=http://localhost:5000
VITE_MAX_FILE_SIZE=500MB

Backend Configuration

Edit backend/app.py to customize:

# Change model size
model = whisper.load_model('large')  # tiny, base, small, medium, large

# Change upload folder
UPLOAD_FOLDER = '/custom/path'

# Change max file size
MAX_FILE_SIZE = 1024 * 1024 * 1024  # 1GB

Troubleshooting

Issue: "API connection failed"

Solution: Ensure backend is running on http://localhost:5000

Issue: "Whisper model not found"

Solution: First run downloads the model (~3GB). Ensure internet connection and disk space.

Issue: "CUDA out of memory"

Solution: Use smaller model or reduce batch size in backend/app.py

Issue: "ffmpeg not found"

Solution: Install ffmpeg using your package manager (see Installation section)

Issue: Port 3000 or 5000 already in use

Solution: Change ports in vite.config.ts and backend/app.py

Performance Tips

  1. Use GPU - Ensure NVIDIA CUDA is properly installed
  2. Choose appropriate model - Balance speed vs accuracy
  3. Close other applications - Free up RAM/VRAM
  4. Use SSD - Faster model loading and file I/O
  5. Batch Processing - Process multiple files sequentially

Future Enhancements

  • Drag-and-drop file upload
  • Audio playback synchronized with transcription
  • Edit segments inline
  • Keyboard shortcuts
  • Save/load sessions
  • Speaker diarization
  • Confidence scores
  • Custom vocabulary support

Development

Code Style

# Format code (if ESLint configured)
npm run lint

# Build for development
npm run dev

# Build for production
npm run build

Adding Components

New components go in src/components/ and should:

  • Use TypeScript
  • Include prop interfaces
  • Export as default
  • Include JSDoc comments

Common Issues & Solutions

Issue Solution
Models slow to load GPU required for fast transcription
File not supported Check file extension is in supported list
Transcription has errors Try larger model (medium/large)
Application crashes Check browser console and Flask logs
Export not working Ensure segments data is complete

License

MIT License - Personal use and modifications allowed

Credits

Built with:

Support

For issues:

  1. Check the troubleshooting section
  2. Verify ffmpeg is installed
  3. Check Flask backend logs
  4. Review browser console for errors
  5. Ensure Python 3.8+ and Node.js 16+ are installed