whisper/QUICKSTART.md

6.2 KiB

Farsi Transcriber - Quick Start Guide

You now have TWO complete applications for Farsi transcription:

🖥️ Option 1: Desktop App (PyQt6)

Location: /home/user/whisper/farsi_transcriber/

Setup

cd farsi_transcriber
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py

Features:

  • Standalone desktop application
  • Works completely offline
  • Direct access to file system
  • Lightweight and fast
  • ⚠️ Simpler UI (green theme)

Good for:

  • Local-only transcription
  • Users who prefer desktop apps
  • Offline processing

🌐 Option 2: Web App (React + Flask)

Location: /home/user/whisper/farsi_transcriber_web/

Setup

Backend (Flask):

cd farsi_transcriber_web/backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python app.py
# API runs on http://localhost:5000

Frontend (React):

cd farsi_transcriber_web
npm install
npm run dev
# App runs on http://localhost:3000

Features:

  • Modern web-based UI (matches your Figma design exactly)
  • File queue management
  • Dark/Light theme toggle
  • Search with text highlighting
  • Copy segments to clipboard
  • Resizable window
  • RTL support for Farsi
  • Multiple export formats
  • Professional styling

Good for:

  • Modern web experience
  • Team collaboration (can be deployed online)
  • More features and polish
  • Professional appearance

📊 Comparison

Feature Desktop (PyQt6) Web (React)
Interface Simple, green Modern, professional
Dark Mode
File Queue
Search
Copy Segments
Resizable Window
Export Formats SRT, TXT, VTT, JSON, TSV TXT, SRT, VTT, JSON
Offline Requires backend
Easy Setup (2 terminals)
Deployment Desktop only Can host online
Code Size ~25KB ~200KB

🚀 Which Should You Use?

Use Desktop App if:

  • You want simple, quick setup
  • You never share transcriptions
  • You prefer offline processing
  • You don't need advanced features

Use Web App if:

  • You like modern interfaces
  • You want dark/light themes
  • You need file queue management
  • You want to potentially share online
  • You want professional appearance

📁 Project Structure

whisper/
├── farsi_transcriber/              (Desktop PyQt6 App)
│   ├── ui/
│   ├── models/
│   ├── utils/
│   ├── config.py
│   ├── main.py
│   └── requirements.txt
│
└── farsi_transcriber_web/          (Web React App)
    ├── src/
    │   ├── App.tsx
    │   ├── components/
    │   └── main.tsx
    ├── backend/
    │   ├── app.py
    │   └── requirements.txt
    ├── package.json
    └── vite.config.ts

🔧 System Requirements

Desktop App

  • Python 3.8+
  • ffmpeg
  • 4GB RAM

Web App

  • Python 3.8+ (backend)
  • Node.js 16+ (frontend)
  • ffmpeg
  • 4GB RAM

📝 Setup Checklist

Initial Setup (One-time)

  • Install ffmpeg

    # Ubuntu/Debian
    sudo apt install ffmpeg
    
    # macOS
    brew install ffmpeg
    
    # Windows
    choco install ffmpeg
    
  • Verify Python 3.8+

    python3 --version
    
  • Verify Node.js 16+ (for web app only)

    node --version
    

Desktop App Setup

  • Create virtual environment
  • Install requirements
  • Run app

Web App Setup

Backend:

  • Create virtual environment
  • Install requirements
  • Run Flask server

Frontend:

  • Install Node dependencies
  • Run dev server

🎯 Quick Start (Fastest)

Desktop (30 seconds)

cd whisper/farsi_transcriber
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt && python main.py

Web (2 minutes)

Terminal 1:

cd whisper/farsi_transcriber_web/backend
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt && python app.py

Terminal 2:

cd whisper/farsi_transcriber_web
npm install && npm run dev

🐛 Troubleshooting

"ffmpeg not found"

Install ffmpeg (see requirements above)

"ModuleNotFoundError" (Python)

# Ensure virtual environment is activated
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

"npm: command not found"

Install Node.js from https://nodejs.org

App runs slow

  • Use GPU: Install CUDA
  • Reduce model size: change to 'small' or 'tiny'
  • Close other applications

📚 Full Documentation

  • Desktop App: farsi_transcriber/README.md
  • Web App: farsi_transcriber_web/README.md
  • API Docs: farsi_transcriber_web/README.md (Endpoints section)

🎓 What Was Built

Desktop Application (PyQt6)

File picker for audio/video Whisper integration with word-level timestamps 5 export formats (TXT, SRT, VTT, JSON, TSV) Professional styling Progress indicators Threading to prevent UI freezing

Web Application (React + Flask)

Complete Figma design implementation File queue management Dark/light theme Search with highlighting Segment management Resizable window RTL support Flask backend with Whisper integration 4 export formats Real file upload handling


🚀 Next Steps

  1. Choose your app (Desktop or Web)
  2. Install ffmpeg if not already installed
  3. Follow the setup instructions above
  4. Test with a Farsi audio file
  5. Export in your preferred format

💡 Tips

  • First transcription is slow (downloads 769MB model)
  • Use larger models (medium/large) for better accuracy
  • Use smaller models (tiny/base) for speed
  • GPU significantly speeds up transcription
  • Both apps work offline (after initial model download)

📧 Need Help?

  • Check the full README in each app's directory
  • Verify all requirements are installed
  • Check browser console (web app) or Python output (desktop)
  • Ensure ffmpeg is in your PATH

Enjoy your Farsi transcription apps! 🎉