From a2905a02926f489b6d5a1f899180df24b06c7a06 Mon Sep 17 00:00:00 2001 From: martab0 <35269974+martab0@users.noreply.github.com> Date: Sat, 11 Jan 2025 23:43:34 +0100 Subject: [PATCH] Improve structure --- README.md | 45 +++++++++++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index 00bd926..8d8774e 100644 --- a/README.md +++ b/README.md @@ -8,11 +8,11 @@ ## Contents -- [Introduction](#introduction) -- [Approach](#approach) -- [Prerequisites](#prerequisites) -- [Installation](#installation) -- [Installation troubleshooting](#installation-troubleshooting) +- [What is Whisper](#what-is-whisper) +- [Setup](#setup) + - [Prerequsites](#prerequisites) + - [Installation](#installation) + - [Installation troubleshooting](#installation-troubleshooting) - [Available models and languages](#available-models-and-languages) - [Performance](#performance) - [Command-line usage](#command-line-usage) @@ -20,42 +20,50 @@ - [More examples](#more-examples) - [License](#license) - -## Introduction +## What is Whisper Whisper is a multilingual speech recognition model for general purposes, including speech translation and language identification. Whisper is trained on a large dataset of diverse audio. -## Approach - ![Approach](https://raw.githubusercontent.com/openai/whisper/main/approach.png) A Transformer sequence-to-sequence model is trained on various speech processing tasks. The tasks include multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder. As a result, a single model replaces many steps in traditional speech processing. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets. We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models. The codebase should be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [OpenAI's tiktoken](https://github.com/openai/tiktoken) for their fast tokenizer implementation. -## Prerequisites +## Setup + +### Prerequisites * Whisper requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system. The command-line tool is available from most package managers. To install [`ffmpeg`](https://ffmpeg.org/), use one of the following commands for your operating system: +**on Ubuntu or Debian** ```bash -# on Ubuntu or Debian sudo apt update && sudo apt install ffmpeg +``` -# on Arch Linux +**on Arch Linux** +```bash sudo pacman -S ffmpeg +``` -# on MacOS using Homebrew (https://brew.sh/) +**on MacOS using Homebrew (https://brew.sh/)** +```bash brew install ffmpeg +``` -# on Windows using Chocolatey (https://chocolatey.org/) +**on Windows using Chocolatey (https://chocolatey.org/)** +```bash choco install ffmpeg +``` -# on Windows using Scoop (https://scoop.sh/) +**on Windows using Scoop (https://scoop.sh/)** +```bash scoop install ffmpeg ``` + * If [tiktoken](https://github.com/openai/tiktoken) does not provide a pre-built wheel for your platform, install [`rust`](http://rust-lang.org). Follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install the Rust development environment. -## Installation +### Installation * You can download and install (or update to) the latest release of Whisper with the following command: @@ -74,7 +82,8 @@ pip install git+https://github.com/openai/whisper.git ```bash pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git ``` -## Installation troubleshooting + +### Installation troubleshooting If you see installation errors during the installation of Whisper, follow these steps: * Check if you have [`rust`](http://rust-lang.org) installed on your system. If not, follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install the Rust development environment. @@ -178,4 +187,4 @@ Use the [🙌 Show and tell](https://github.com/openai/whisper/discussions/categ ## License -Whisper's code and model weights are released under the Massachusetts Institute of Technology (MIT) License. See [LICENSE](https://github.com/openai/whisper/blob/main/LICENSE) for further details. \ No newline at end of file +Whisper's code and model weights are released under the Massachusetts Institute of Technology (MIT) License. See [LICENSE](https://github.com/openai/whisper/blob/main/LICENSE) for further details.