mirror of https://github.com/openai/whisper.git synced 2025-11-23 22:15:58 +00:00

History

copilot-swe-agent[bot] 538917daa8 Update openai-whisper to compatible version for deployment

Co-authored-by: ariavn-byte <151469489+ariavn-byte@users.noreply.github.com>

2025-11-15 23:20:13 +00:00

backend

Update openai-whisper to compatible version for deployment

2025-11-15 23:20:13 +00:00

src

Fix build configuration and TypeScript errors for Railway deployment

2025-11-15 23:16:54 +00:00

.env.example

feat: Prepare web app for Railway deployment

2025-11-15 21:49:26 +00:00

.gitignore

Fix build configuration and TypeScript errors for Railway deployment

2025-11-15 23:16:54 +00:00

index.html

feat: Create React web application with Figma design and Flask backend

2025-11-13 08:03:09 +00:00

nixpacks.toml

Ready for Railway deployment

2025-11-15 23:11:28 +00:00

package-lock.json

Fix build configuration and TypeScript errors for Railway deployment

2025-11-15 23:16:54 +00:00

package.json

Fix build configuration and TypeScript errors for Railway deployment

2025-11-15 23:16:54 +00:00

postcss.config.js

Fix build configuration and TypeScript errors for Railway deployment

2025-11-15 23:16:54 +00:00

RAILWAY_DEPLOYMENT.md

feat: Prepare web app for Railway deployment

2025-11-15 21:49:26 +00:00

railway.toml

feat: Prepare web app for Railway deployment

2025-11-15 21:49:26 +00:00

README.md

feat: Create React web application with Figma design and Flask backend

2025-11-13 08:03:09 +00:00

tailwind.config.js

feat: Create React web application with Figma design and Flask backend

2025-11-13 08:03:09 +00:00

tsconfig.json

feat: Create React web application with Figma design and Flask backend

2025-11-13 08:03:09 +00:00

tsconfig.node.json

feat: Create React web application with Figma design and Flask backend

2025-11-13 08:03:09 +00:00

vite.config.ts

Ready for Railway deployment

2025-11-15 23:11:28 +00:00

README.md

Farsi Transcriber - Web Application

A professional web-based application for transcribing Farsi audio and video files using OpenAI's Whisper model.

Features

✨ Core Features

🎙️ Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, AAC, WMA)
🎬 Extract audio from video files (MP4, MKV, MOV, WebM, AVI, FLV, WMV)
🇮🇷 High-accuracy Farsi/Persian language transcription
⏱️ Word-level timestamps for precise timing
📤 Export to multiple formats (TXT, SRT, VTT, JSON)
💻 Clean, intuitive React-based UI with Figma design
🎨 Dark/Light theme toggle
🔍 Search and text highlighting in transcriptions
📋 File queue management
💾 Copy individual transcription segments
🚀 GPU acceleration support (CUDA)
🎯 Resizable window for flexible workspace

Tech Stack

Frontend:

React 18+ with TypeScript
Vite (fast build tool)
Tailwind CSS v4.0
Lucide React (icons)
re-resizable (window resizing)
Sonner (toast notifications)

Backend:

Flask (Python web framework)
OpenAI Whisper (speech recognition)
PyTorch (deep learning)
Flask-CORS (cross-origin requests)

System Requirements

Frontend:

Node.js 16+
npm/yarn/pnpm

Backend:

Python 3.8+
4GB RAM minimum
8GB+ recommended
ffmpeg installed
Optional: NVIDIA GPU with CUDA support

Installation

Step 1: Install ffmpeg

Choose your operating system:

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

macOS (Homebrew):

brew install ffmpeg

Windows (Chocolatey):

choco install ffmpeg

Step 2: Backend Setup

# Navigate to backend directory
cd backend

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 3: Frontend Setup

# Navigate to root directory
cd ..

# Install Node dependencies
npm install

# Or use yarn/pnpm
yarn install
# or
pnpm install

Running the Application

Step 1: Start Backend API

cd backend
source venv/bin/activate  # Activate virtual environment
python app.py

The API will be available at http://localhost:5000

Step 2: Start Frontend Dev Server

In a new terminal:

npm run dev

The application will be available at http://localhost:3000

Building for Production

Frontend Build

npm run build

This creates optimized production build in dist/ directory.

Backend Deployment

For production, use a production WSGI server:

# Install Gunicorn
pip install gunicorn

# Run with Gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app

API Endpoints

`/health` (GET)

Health check endpoint

Response:

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cuda|cpu"
}

`/transcribe` (POST)

Transcribe audio/video file

Request:

file: Audio/video file (multipart/form-data)
language: Language code (optional, default: "fa" for Farsi)

Response:

{
  "status": "success",
  "filename": "audio.mp3",
  "language": "fa",
  "text": "Full transcription text...",
  "segments": [
    {
      "start": "00:00:00.000",
      "end": "00:00:05.500",
      "text": "سلام دنیا"
    }
  ]
}

`/models` (GET)

Get available Whisper models

Response:

{
  "available_models": ["tiny", "base", "small", "medium", "large"],
  "current_model": "medium",
  "description": "..."
}

`/export` (POST)

Export transcription

Request:

{
  "transcription": "Full text...",
  "segments": [...],
  "format": "txt|srt|vtt|json"
}

Response:

{
  "status": "success",
  "format": "srt",
  "content": "...",
  "mime_type": "text/plain"
}

Usage Guide

1. Add Files to Queue

Click "Add Files" button in the left sidebar
Select audio or video files
Multiple files can be added to the queue

2. Transcribe

Select a file from the queue
Click "Transcribe" button
Watch the progress indicator
Results appear with timestamps

3. Search & Copy

Use the search bar to find specific text
Matching text is highlighted
Click copy icon to copy individual segments

4. Export Results

Select export format (TXT, SRT, VTT, JSON)
Click "Export" button
File is downloaded or ready to save

5. Theme Toggle

Click sun/moon icon in header
Switch between light and dark themes

Project Structure

farsi_transcriber_web/
├── src/
│   ├── App.tsx              # Main application component
│   ├── main.tsx             # React entry point
│   ├── index.css            # Global styles
│   └── components/
│       ├── Button.tsx
│       ├── Progress.tsx
│       ├── Input.tsx
│       └── Select.tsx
├── backend/
│   ├── app.py               # Flask API server
│   ├── requirements.txt      # Python dependencies
│   └── .gitignore
├── public/
├── package.json
├── vite.config.ts
├── tsconfig.json
├── tailwind.config.js
├── postcss.config.js
└── README.md

Configuration

Environment Variables

Create a .env.local file in the root directory:

VITE_API_URL=http://localhost:5000
VITE_MAX_FILE_SIZE=500MB

Backend Configuration

Edit backend/app.py to customize:

# Change model size
model = whisper.load_model('large')  # tiny, base, small, medium, large

# Change upload folder
UPLOAD_FOLDER = '/custom/path'

# Change max file size
MAX_FILE_SIZE = 1024 * 1024 * 1024  # 1GB

Troubleshooting

Issue: "API connection failed"

Solution: Ensure backend is running on http://localhost:5000

Issue: "Whisper model not found"

Solution: First run downloads the model (~3GB). Ensure internet connection and disk space.

Issue: "CUDA out of memory"

Solution: Use smaller model or reduce batch size in backend/app.py

Issue: "ffmpeg not found"

Solution: Install ffmpeg using your package manager (see Installation section)

Issue: Port 3000 or 5000 already in use

Solution: Change ports in vite.config.ts and backend/app.py

Performance Tips

Use GPU - Ensure NVIDIA CUDA is properly installed
Choose appropriate model - Balance speed vs accuracy
Close other applications - Free up RAM/VRAM
Use SSD - Faster model loading and file I/O
Batch Processing - Process multiple files sequentially

Future Enhancements

Drag-and-drop file upload
Audio playback synchronized with transcription
Edit segments inline
Keyboard shortcuts
Save/load sessions
Speaker diarization
Confidence scores
Custom vocabulary support

Development

Code Style

# Format code (if ESLint configured)
npm run lint

# Build for development
npm run dev

# Build for production
npm run build

Adding Components

New components go in src/components/ and should:

Use TypeScript
Include prop interfaces
Export as default
Include JSDoc comments

Common Issues & Solutions

Issue	Solution
Models slow to load	GPU required for fast transcription
File not supported	Check file extension is in supported list
Transcription has errors	Try larger model (medium/large)
Application crashes	Check browser console and Flask logs
Export not working	Ensure segments data is complete

License

MIT License - Personal use and modifications allowed

Credits

Built with:

OpenAI Whisper - Speech recognition
React - UI framework
Vite - Build tool
Tailwind CSS - Styling
Flask - Backend framework

Support

For issues:

Check the troubleshooting section
Verify ffmpeg is installed
Check Flask backend logs
Review browser console for errors
Ensure Python 3.8+ and Node.js 16+ are installed

README.md

Farsi Transcriber - Web Application

Features

Tech Stack

System Requirements

Installation

Step 1: Install ffmpeg

Step 2: Backend Setup

Step 3: Frontend Setup

Running the Application

Step 1: Start Backend API

Step 2: Start Frontend Dev Server

Building for Production

Frontend Build

Backend Deployment

API Endpoints

/health (GET)

/transcribe (POST)

/models (GET)

/export (POST)

Usage Guide

1. Add Files to Queue

2. Transcribe

3. Search & Copy

4. Export Results

5. Theme Toggle

Project Structure

Configuration

Environment Variables

Backend Configuration

Troubleshooting

Issue: "API connection failed"

Issue: "Whisper model not found"

Issue: "CUDA out of memory"

Issue: "ffmpeg not found"

Issue: Port 3000 or 5000 already in use

Performance Tips

Future Enhancements

Development

Code Style

Adding Components

Common Issues & Solutions

License

Credits

Support

`/health` (GET)

`/transcribe` (POST)

`/models` (GET)

`/export` (POST)