diff --git a/README.md b/README.md index f02d394..a8e4654 100644 --- a/README.md +++ b/README.md @@ -1,34 +1,53 @@ -## Enhanced Features +# Whisper Enhanced Transcription and Translation -This fork of OpenAI's Whisper includes several enhancements to improve file organization, user experience, and ease of transcription. Below is a summary of the new features: +This fork of OpenAI's Whisper provides an enhanced interface for audio transcription and translation. Using Streamlit, the app offers an interactive way to transcribe audio files to English and translate them into various languages. + +## Key Features ### 1. **Automated Folder Creation for Each Transcription Run** - - Each time an audio file is transcribed, a unique folder is created under a parent directory named `Results`. - - The unique folder is named based on the original audio file name and a timestamp, e.g., `Results/[audio_file_name]_[timestamp]`. - - This structure keeps transcription results organized and prevents overwriting, making it easy to manage and review multiple transcriptions. + - Each transcription is stored in a unique folder under a parent directory named `Results`. + - The folder is named based on the original audio file name and a timestamp, e.g., `Results/[audio_file_name]_[timestamp]`, ensuring organized storage without overwriting previous transcriptions. ### 2. **Temporary File Storage for Uploaded Audio Files** - - Uploaded audio files are stored temporarily in a folder named `TempUploads`. - - This separation between original audio files and transcription results enhances organization and simplifies the process of clearing temporary files when they’re no longer needed. + - Uploaded audio files are stored temporarily in a `TempUploads` folder. + - This separation helps manage temporary files separately from the main transcription results. -### 3. **Interactive Web Interface: `app.py`** - - The `app.py` script, built with Streamlit, serves as the main interface for Whisper. This web-based UI provides an intuitive way to interact with Whisper without needing the command line. +### 3. **Interactive Web Interface with Streamlit (`app.py`)** + - The main interface is `app.py`, a web-based UI that enables users to upload audio files, select transcription models, and specify translation preferences. - **Features**: - - **Upload Audio Files**: Supports various audio formats (e.g., MP3, WAV, M4A, MP4) and stores them temporarily in `TempUploads`. - - **Choose Model Size**: Allows users to select from Whisper model sizes (`tiny`, `base`, `small`, `medium`, `large`). - - **Organized Transcription Output**: Each transcription is saved in a unique folder under `Results`, with the transcription stored as `transcription.txt`. + - **Upload Audio Files**: Supports audio formats like MP3, WAV, M4A, and MP4. + - **Choose Model Size**: Users can select from Whisper’s model sizes (`tiny`, `base`, `small`, `medium`, `large`) for transcription. + - **Specify Translation Language**: After transcription to English, the text can be translated to languages like Turkish, Spanish, French, German, Chinese, and Japanese. - **Usage**: - - First, install Streamlit if you haven’t already: + - First, install the necessary libraries (including Streamlit and googletrans): ```bash - pip install streamlit + pip install streamlit googletrans==4.0.0-rc1 ``` - - Then, run the app: + - Run the app with: ```bash streamlit run app.py ``` - - Open your browser and go to the provided URL (usually `http://localhost:8501`) to access the app. + - Open the app in your browser (usually at `http://localhost:8501`). ---- +### 4. **Translation Options with Google Translate Integration** + - Once transcribed to English, the app can translate the text into various languages. + - **Translation Workflow**: + - Select a language from the `Translate Transcription To` dropdown. + - The app uses Google Translate to translate the English transcription to the selected language. + - Each translated text is stored in a `Translations` subfolder within the transcription folder, named `[target_language]_translation.txt`. + +### Example Workflow + +1. **Upload an Audio File**: Choose a file in MP3, WAV, M4A, or MP4 format. +2. **Select Transcription Options**: + - Choose the model size for transcription. + - Select a language for translation, if desired (or leave as "None" for English-only transcription). +3. **View and Save Results**: + - The app displays the English transcription and any selected translations. + - Transcriptions are saved in `Results/[audio_file_name]_[timestamp]/transcription.txt`. + - Translations are saved in `Results/[audio_file_name]_[timestamp]/Translations/[target_language]_translation.txt`. + +This enhanced setup offers a flexible, organized approach to audio transcription and translation, making Whisper accessible and powerful for multilingual projects. These updates make `app.py` the primary and streamlined interface for managing transcriptions with Whisper. Temporary files and organized results folders ensure clear file management, while the web UI allows users to interact with Whisper effortlessly. diff --git a/Results/DarknessHuntUs_20241113_110213/Translations/Spanish_translation.txt b/Results/DarknessHuntUs_20241113_110213/Translations/Spanish_translation.txt new file mode 100644 index 0000000..e76365c --- /dev/null +++ b/Results/DarknessHuntUs_20241113_110213/Translations/Spanish_translation.txt @@ -0,0 +1,2 @@ +(Translated to Spanish) + I fear darkness hunts us. Use what you've learned and stay the course. \ No newline at end of file diff --git a/Results/DarknessHuntUs_20241113_110213/transcription.txt b/Results/DarknessHuntUs_20241113_110213/transcription.txt new file mode 100644 index 0000000..03083b0 --- /dev/null +++ b/Results/DarknessHuntUs_20241113_110213/transcription.txt @@ -0,0 +1 @@ + I fear darkness hunts us. Use what you've learned and stay the course. \ No newline at end of file diff --git a/Results/DarknessHuntUs_20241113_110537/Translations/Spanish_translation.txt b/Results/DarknessHuntUs_20241113_110537/Translations/Spanish_translation.txt new file mode 100644 index 0000000..792d67b --- /dev/null +++ b/Results/DarknessHuntUs_20241113_110537/Translations/Spanish_translation.txt @@ -0,0 +1 @@ +Me temo que la oscuridad nos caza.Usa lo que has aprendido y mantente en el curso. \ No newline at end of file diff --git a/Results/DarknessHuntUs_20241113_110537/transcription.txt b/Results/DarknessHuntUs_20241113_110537/transcription.txt new file mode 100644 index 0000000..03083b0 --- /dev/null +++ b/Results/DarknessHuntUs_20241113_110537/transcription.txt @@ -0,0 +1 @@ + I fear darkness hunts us. Use what you've learned and stay the course. \ No newline at end of file diff --git a/Results/DarknessHuntUs_20241113_110627/Translations/German_translation.txt b/Results/DarknessHuntUs_20241113_110627/Translations/German_translation.txt new file mode 100644 index 0000000..43a6c86 --- /dev/null +++ b/Results/DarknessHuntUs_20241113_110627/Translations/German_translation.txt @@ -0,0 +1 @@ +Ich fürchte, Dunkelheit jagt uns.Verwenden Sie das, was Sie gelernt haben, und bleiben Sie den Kurs. \ No newline at end of file diff --git a/Results/DarknessHuntUs_20241113_110627/transcription.txt b/Results/DarknessHuntUs_20241113_110627/transcription.txt new file mode 100644 index 0000000..03083b0 --- /dev/null +++ b/Results/DarknessHuntUs_20241113_110627/transcription.txt @@ -0,0 +1 @@ + I fear darkness hunts us. Use what you've learned and stay the course. \ No newline at end of file diff --git a/Results/DarknessHuntUs_20241113_111235/Translations/Turkish_translation.txt b/Results/DarknessHuntUs_20241113_111235/Translations/Turkish_translation.txt new file mode 100644 index 0000000..201be35 --- /dev/null +++ b/Results/DarknessHuntUs_20241113_111235/Translations/Turkish_translation.txt @@ -0,0 +1 @@ +Karanlığın bizi avladığından korkuyorum.Öğrendiklerinizi kullanın ve kursu kalın. \ No newline at end of file diff --git a/Results/DarknessHuntUs_20241113_111235/transcription.txt b/Results/DarknessHuntUs_20241113_111235/transcription.txt new file mode 100644 index 0000000..03083b0 --- /dev/null +++ b/Results/DarknessHuntUs_20241113_111235/transcription.txt @@ -0,0 +1 @@ + I fear darkness hunts us. Use what you've learned and stay the course. \ No newline at end of file diff --git a/Results/DontForgetToSubscribe_20241113_110744/transcription.txt b/Results/DontForgetToSubscribe_20241113_110744/transcription.txt new file mode 100644 index 0000000..85a9031 --- /dev/null +++ b/Results/DontForgetToSubscribe_20241113_110744/transcription.txt @@ -0,0 +1 @@ + Don't forget to subscribe. \ No newline at end of file diff --git a/Results/DontForgetToSubscribe_20241113_110753/Translations/Japanese_translation.txt b/Results/DontForgetToSubscribe_20241113_110753/Translations/Japanese_translation.txt new file mode 100644 index 0000000..ffae6fa --- /dev/null +++ b/Results/DontForgetToSubscribe_20241113_110753/Translations/Japanese_translation.txt @@ -0,0 +1 @@ +購読することを忘れないでください。 \ No newline at end of file diff --git a/Results/DontForgetToSubscribe_20241113_110753/transcription.txt b/Results/DontForgetToSubscribe_20241113_110753/transcription.txt new file mode 100644 index 0000000..85a9031 --- /dev/null +++ b/Results/DontForgetToSubscribe_20241113_110753/transcription.txt @@ -0,0 +1 @@ + Don't forget to subscribe. \ No newline at end of file diff --git a/Results/DontForgetToSubscribe_20241113_112347/Translations/Turkish_translation.txt b/Results/DontForgetToSubscribe_20241113_112347/Translations/Turkish_translation.txt new file mode 100644 index 0000000..366b439 --- /dev/null +++ b/Results/DontForgetToSubscribe_20241113_112347/Translations/Turkish_translation.txt @@ -0,0 +1 @@ +Abone olmayı unutmayın. \ No newline at end of file diff --git a/Results/DontForgetToSubscribe_20241113_112347/transcription.txt b/Results/DontForgetToSubscribe_20241113_112347/transcription.txt new file mode 100644 index 0000000..85a9031 --- /dev/null +++ b/Results/DontForgetToSubscribe_20241113_112347/transcription.txt @@ -0,0 +1 @@ + Don't forget to subscribe. \ No newline at end of file diff --git a/app.py b/app.py index ed9839d..a3a2b06 100644 --- a/app.py +++ b/app.py @@ -2,10 +2,11 @@ import streamlit as st import os from datetime import datetime from whisper import load_model, transcribe +from googletrans import Translator # Set up the app title and description -st.title("Whisper Audio Transcription") -st.write("Upload an audio file and choose a model to transcribe it using OpenAI's Whisper.") +st.title("Whisper Audio Transcription and Translation") +st.write("Upload an audio file, choose a model, and optionally translate the transcription.") # File uploader widget uploaded_file = st.file_uploader("Choose an audio file...", type=["mp3", "wav", "m4a", "mp4"]) @@ -13,29 +14,30 @@ uploaded_file = st.file_uploader("Choose an audio file...", type=["mp3", "wav", # Model selection widget model_size = st.selectbox("Choose model size:", ["tiny", "base", "small", "medium", "large"]) +# Translation selection +target_language = st.selectbox("Translate Transcription To", ["None", "Spanish", "French", "German", "Chinese", "Japanese", "Turkish", "English"]) + # Define folders for temporary uploads and results temp_upload_folder = "TempUploads" results_folder = "Results" -os.makedirs(temp_upload_folder, exist_ok=True) # Create TempUploads if it doesn't exist -os.makedirs(results_folder, exist_ok=True) # Create Results if it doesn't exist +os.makedirs(temp_upload_folder, exist_ok=True) +os.makedirs(results_folder, exist_ok=True) + +# Initialize Google Translator +translator = Translator() # Function to create a unique output folder for each transcription run def create_output_folder(audio_file): - # Use the audio file name (without extension) and a timestamp to create a unique folder name folder_name = os.path.splitext(os.path.basename(audio_file))[0] timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") output_folder = os.path.join(results_folder, f"{folder_name}_{timestamp}") - - # Create the output folder if it doesn’t exist os.makedirs(output_folder, exist_ok=True) return output_folder # Button to start transcription if st.button("Transcribe"): if uploaded_file is not None: - # Save the uploaded file temporarily with its original name in TempUploads temp_file_path = os.path.join(temp_upload_folder, uploaded_file.name) - with open(temp_file_path, "wb") as f: f.write(uploaded_file.getbuffer()) @@ -47,18 +49,33 @@ if st.button("Transcribe"): # Run transcription try: - result = transcribe(model, temp_file_path) + result = transcribe(model, temp_file_path, task="transcribe") - # Save transcription to a text file in the output folder + # Save the English transcription to a text file output_file = os.path.join(output_folder, "transcription.txt") with open(output_file, "w") as f: f.write(result["text"]) # Display the transcription result in the app - st.write("### Transcription Result") + st.write("### Transcription Result (English)") st.write(result["text"]) st.write(f"Transcription saved to {output_file}") + # Translate if a target language is selected + if target_language != "None": + translation = translator.translate(result["text"], dest=target_language.lower()).text + # Save the translation to a text file in a "Translations" subfolder + translations_folder = os.path.join(output_folder, "Translations") + os.makedirs(translations_folder, exist_ok=True) + translation_file = os.path.join(translations_folder, f"{target_language}_translation.txt") + with open(translation_file, "w") as f: + f.write(translation) + + # Display the translated result + st.write(f"### Translated Transcription ({target_language})") + st.write(translation) + st.write(f"Translation saved to {translation_file}") + except Exception as e: st.write("An error occurred:", e) else: