11 KiB
Voice-to-Text User Guide
Quick Start
Terminal Mode (Default)
- Run the script:
python voice_to_text.py - Choose option 1: Record (Enter to stop)
- Speak your prompt: Use natural language with smart commands
- Get your prompt: Processed text is automatically copied to clipboard
- Paste in Claude Code: Ctrl+V to paste the optimized prompt
GUI Mode
- Run with UI flag:
python voice_to_text.py ui - Click Record button or press F1 anywhere to start recording
- Speak your prompt: Use natural language with smart commands
- Click Stop or press F1 again to finish recording
- Get your prompt: Processed text is automatically copied to clipboard
- Paste in Claude Code: Ctrl+V to paste the optimized prompt
Installation
Prerequisites
- Python 3.8 or higher
- Microphone access
- Internet connection (for initial Whisper model download)
Setup
# Install dependencies
pip install -r requirements.txt
# Run the script
python voice_to_text.py
How to Use
Terminal Mode
- Option 1: Record (Press Enter to stop)
- Choose option
1 - Speak your prompt after "Recording..." appears
- Press Enter to stop recording
- Choose option
- Option 2: Quit
- Choose option
2to exit the program
- Choose option
GUI Mode
- Global Hotkey: Press F1 (or custom key) anywhere on your system to start/stop recording
- Record Button: Click the microphone button to start/stop recording
- Visual Feedback: Button changes color and text during recording
- Real-time Status: Status bar shows current recording state
- Results Display: Both raw and processed transcriptions shown in text areas
- Always on Top: Optional setting to keep window visible above other apps
- System Tray: Minimize to tray, access from system tray icon
- Settings: Configurable hotkeys, Whisper models, and preferences
Smart Voice Commands
The system automatically converts natural speech into Claude Code-optimized prompts:
Agent Commands
| Say This | Gets Converted To |
|---|---|
| "use agent python-pro" | @agent python-pro |
| "launch agent debug specialist" | @agent debug-specialist |
| "call agent javascript pro" | @agent javascript-pro |
Tool References
| Say This | Gets Converted To |
|---|---|
| "run tool bash" | @tool bash |
| "use the grep tool" | @tool grep |
| "call the read tool" | @tool read |
File & Directory References
| Say This | Gets Converted To |
|---|---|
| "directory downloads" | @dir downloads/ |
| "file package.json" | @file package.json |
| "the readme file" | @file README.md |
| "folder source" | @dir source/ |
Code Elements
| Say This | Gets Converted To |
|---|---|
| "function get user" | `getUser()` function |
| "class user manager" | `UserManager` class |
| "variable user name" | `userName` variable |
| "method save data" | `saveData()` method |
Task Management
| Say This | Gets Converted To |
|---|---|
| "add to todo" | add to todo: |
| "new task" | new todo: |
| "mark complete" | mark todo complete |
| "mark done" | mark todo complete |
Common Commands
| Say This | Gets Converted To |
|---|---|
| "run tests" | run tests |
| "commit changes" | commit changes |
| "create pull request" | create PR |
| "install dependencies" | install dependencies |
Example Workflow
Terminal Mode Example
- Run
python voice_to_text.py - Choose option
1(Record) - Speak: "Use agent python pro to review file auth.py and run tests"
- Press Enter to stop
- See processed result: "@agent python-pro to review @file auth.py and run tests."
- Text automatically copied to clipboard
- Choose option
1to record again or2to quit
GUI Mode Example
- Run
python voice_to_text.py ui - Press F1 (or click Record button)
- Speak: "Add to todo fix the authentication bug in function login user"
- Press F1 again (or click Stop)
- See both raw and processed results in the GUI
- Processed text: "Add to todo: fix the authentication bug in
loginUser()function." - Text automatically copied to clipboard
- Press F1 again for next recording
Voice Command Examples
Testing a Feature:
- Say: "I just finished implementing the user authentication feature. Can you use agent python pro to review the code in file auth.py and then run tests to make sure everything works?"
- Gets processed to: "I just finished implementing the user authentication feature. Can you @agent python-pro to review the code in @file auth.py and then run tests to make sure everything works?"
File Operations:
- Say: "Please read file package.json and check the dependencies in folder node modules then use tool bash to run npm install"
- Gets processed to: "Please read @file package.json and check the dependencies in @dir node_modules/ then @tool bash to run npm install."
Task Management:
- Say: "Add to todo fix the authentication bug in function login user and mark the previous task as complete"
- Gets processed to: "Add to todo: fix the authentication bug in
loginUser()function and mark todo complete."
Tips for Better Results
Speaking Clearly
- Speak at normal pace (not too fast or slow)
- Use clear pronunciation
- Pause briefly between different concepts
- Speak in a quiet environment
Effective Commands
- Use specific file names: "file config.json" not "the config file"
- Mention directories explicitly: "directory source" not "the source"
- Use consistent naming: "function getUserData" not "the get user data function"
Natural Language
- Speak naturally - the system handles capitalization and punctuation
- Use complete sentences when possible
- Don't worry about perfect grammar - focus on clarity
Output
What You See
- Raw Transcription: Exactly what Whisper heard
- Processed Prompt: Optimized version for Claude Code
- Clipboard Confirmation: "✓ Processed prompt copied to clipboard!"
- File Location: Path to saved transcript in
/transcriptsfolder
File Storage
All transcripts are saved in the transcripts/ folder with timestamps:
- Format:
transcription_YYYYMMDD_HHMMSS.txt - Content: Both raw and processed versions
- Sorting: Files are chronologically ordered
Mode Comparison
| Feature | Terminal Mode | GUI Mode |
|---|---|---|
| Launch | python voice_to_text.py |
python voice_to_text.py ui |
| Recording | Enter to stop | Button or custom hotkey |
| Global Hotkey | ❌ No | ✅ Customizable (F1-F12) |
| Visual Feedback | Text only | Button colors, status bar |
| Results Display | Console output | Scrollable text areas |
| Multiple Sessions | Menu driven | Always available |
| Background Use | ❌ Terminal focused | ✅ Hotkey works anywhere |
| Always on Top | ❌ No | ✅ Optional setting |
| System Tray | ❌ No | ✅ Minimize to tray |
| Settings | ❌ No | ✅ Full settings dialog |
| Best For | Quick one-off recordings | Continuous workflow |
Advanced Usage
GUI Settings Dialog
Access settings through:
- Settings Button: Click the ⚙️ Settings button in the GUI
- System Tray: Right-click tray icon → Settings (when minimized)
Available Settings:
- Global Hotkey: Choose F1-F12 for recording control
- Whisper Model: Select from tiny, base, small, medium, large, turbo
- Always on Top: Keep window above other applications
- Minimize to Tray: Hide to system tray instead of closing
- Auto Copy Clipboard: Automatically copy processed text
Model Trade-offs:
- tiny: Fastest, least accurate (~39M parameters)
- base: Balanced, recommended (~74M parameters)
- large: Most accurate, slower (~1550M parameters)
- turbo: Fast and accurate (~809M parameters)
System Tray Features
When minimized to tray, right-click the tray icon for:
- Show: Restore the main window
- Record: Start/stop recording directly from tray
- Settings: Open settings dialog
- Quit: Exit the application completely
Settings File
Settings are automatically saved to voice_to_text_settings.json with:
{
\"hotkey\": \"f1\",
\"always_on_top\": false,
\"minimize_to_tray\": true,
\"whisper_model\": \"base\",
\"auto_copy_clipboard\": true
}
Manual Customization
For advanced users, you can:
- Add Custom Patterns: Edit the
PromptProcessorclass patterns list - Modify Default Settings: Edit
default_settingsinSettingsManager - Custom Hotkeys: Use any key combination supported by pynput
Troubleshooting
Audio Issues
Problem: "No microphone detected"
- Solution: Check microphone permissions and connections
- Windows: Settings > Privacy > Microphone
- Mac: System Preferences > Security & Privacy > Microphone
Problem: "Recording sounds muffled"
- Solution: Check microphone positioning and background noise
- Move closer to microphone
- Reduce background noise
GUI Mode Issues
Problem: "F1 hotkey not working"
- Solution:
- Check if another application is using F1
- Try running as administrator (Windows)
- Restart the application
Problem: "GUI window not responding"
- Solution:
- Wait for Whisper model to load (first time is slow)
- Check task manager for hung processes
- Restart the application
Transcription Issues
Problem: "Poor transcription accuracy"
- Solution:
- Speak more clearly and slowly
- Reduce background noise
- Check microphone quality
- Consider upgrading to larger Whisper model
Problem: "Model loading takes too long"
- Solution: First run downloads the model (~150MB for base model)
- Subsequent runs are much faster
- Consider using smaller
tinymodel for speed
Clipboard Issues
Problem: "Could not copy to clipboard"
- Solution:
- Copy the processed text manually
- Check clipboard permissions
- Restart the application
Processing Issues
Problem: "Smart commands not working"
- Solution:
- Check pronunciation of keywords
- Use exact phrases from the reference table
- Speak clearly and pause between concepts
Advanced Usage
Changing Whisper Model
Edit line 29 in voice_to_text.py:
model = whisper.load_model(\"base\") # Change to: tiny, small, medium, large, turbo
Model Trade-offs:
- tiny: Fastest, least accurate
- base: Balanced (recommended)
- large: Most accurate, slower
Adding Custom Patterns
To add your own smart commands, edit the PromptProcessor class patterns list in voice_to_text.py.
Batch Processing
For processing multiple audio files, consider modifying the script to accept file arguments rather than recording live audio.
Support
Common Questions
Q: Can I use this offline? A: Yes, after the initial model download, everything runs locally.
Q: What audio formats are supported? A: The script records in WAV format. For existing files, Whisper supports many formats.
Q: Can I change the recording quality?
A: Yes, modify the sample_rate parameter in the VoiceRecorder constructor.
Getting Help
- Check the technical documentation for implementation details
- Review the troubleshooting section above
- Ensure all dependencies are properly installed