From daaf0ed4ca670af41782cbaec6cdcc001d6fa5de Mon Sep 17 00:00:00 2001 From: Saida Yengui Date: Sat, 16 Aug 2025 15:33:13 +0100 Subject: [PATCH] feat: Add real-time streaming example with verification steps --- whisper/examples/README.md | 87 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 83 insertions(+), 4 deletions(-) diff --git a/whisper/examples/README.md b/whisper/examples/README.md index bd96e19..0ea6271 100644 --- a/whisper/examples/README.md +++ b/whisper/examples/README.md @@ -1,5 +1,84 @@ -## Real-Time Streaming +# Real-Time Whisper Transcription Example + +This example demonstrates live microphone transcription using OpenAI's Whisper. + +## Features +- Real-time audio capture from microphone +- Automatic sample rate detection +- Continuous transcription output + +## Installation +```bash +# System requirements (Linux) +sudo apt install portaudio19-dev alsa-utils + +# Python packages +pip install -e . # Install whisper +pip install sounddevice numpy +``` + +## Usage +```bash +python examples/realtime_streaming.py +``` + +## Verification Steps +To confirm accurate transcription: + +1. **Test Setup** (run in terminal): + ```bash + # Check audio devices + python3 -c "import sounddevice as sd; print(sd.query_devices())" + + # Verify microphone input + python3 -c "import sounddevice as sd; import numpy as np; \ + def print_vol(indata, frames, time, status): \ + print(f'Volume: {np.sqrt(np.mean(indata**2)):.4f}'); \ + with sd.InputStream(callback=print_vol): sd.sleep(5000)" + ``` + - Speak normally - you should see volume values between 0.1-0.5 + - If values are <0.01, check mic permissions/volume + +2. **Accuracy Test**: + - Say clearly: "The quick brown fox jumps over the lazy dog" + - Expected output should match closely + - If inaccurate, try: + ```python + model = whisper.load_model("base") # In script - more accurate than "tiny" + ``` + +## Troubleshooting +| Symptom | Solution | +|---------|----------| +| No transcription | 1. Run `alsamixer` to increase mic volume
2. Try different device IDs (0,1,4,11) | +| Wrong words | 1. Speak closer to mic
2. Use `model="base"` or `"small"` | +| Delayed output | Reduce `blocksize=1024` in code | + +## Expected Output +``` +Starting transcription... (Press Ctrl+C to stop) +The quick brown fox jumps over the lazy dog +``` +``` + +### **Key Improvements**: +1. Added **verification steps** to confirm mic is working +2. Included **accuracy testing** with standard test sentence +3. Added **troubleshooting table** for common issues +4. Shows **expected output** example + +### **How to Update**: +1. Open `examples/README.md` +2. Replace contents with the above markdown +3. Commit changes: + ```bash + git add examples/README.md + git commit -m "docs: Add detailed verification steps" + git push + ``` + +This will help users (including yourself) verify if the transcription is working properly. The test sentence "The quick brown fox..." is particularly useful because: +- Contains all English letters +- Easy to recognize when correct +- Helps identify specific sound recognition issues -For live microphone transcription: -```python -python examples/realtime_streaming.py \ No newline at end of file