Merge pull request #2 from KomunikacjaTechnicznaVistula/main

Improve structure
This commit is contained in:
kalarp 2025-01-15 22:10:30 +01:00 committed by GitHub
commit 6d2c1f1ab3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -8,11 +8,11 @@
## Contents <!-- omit in toc --> ## Contents <!-- omit in toc -->
- [Introduction](#introduction) - [What is Whisper](#what-is-whisper)
- [Approach](#approach) - [Setup](#setup)
- [Prerequisites](#prerequisites) - [Prerequsites](#prerequisites)
- [Installation](#installation) - [Installation](#installation)
- [Installation troubleshooting](#installation-troubleshooting) - [Installation troubleshooting](#installation-troubleshooting)
- [Available models and languages](#available-models-and-languages) - [Available models and languages](#available-models-and-languages)
- [Performance](#performance) - [Performance](#performance)
- [Command-line usage](#command-line-usage) - [Command-line usage](#command-line-usage)
@ -20,42 +20,50 @@
- [More examples](#more-examples) - [More examples](#more-examples)
- [License](#license) - [License](#license)
## What is Whisper
## Introduction
Whisper is a multilingual speech recognition model for general purposes, including speech translation and language identification. Whisper is trained on a large dataset of diverse audio. Whisper is a multilingual speech recognition model for general purposes, including speech translation and language identification. Whisper is trained on a large dataset of diverse audio.
## Approach
![Approach](https://raw.githubusercontent.com/openai/whisper/main/approach.png) ![Approach](https://raw.githubusercontent.com/openai/whisper/main/approach.png)
A Transformer sequence-to-sequence model is trained on various speech processing tasks. The tasks include multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder. As a result, a single model replaces many steps in traditional speech processing. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets. A Transformer sequence-to-sequence model is trained on various speech processing tasks. The tasks include multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder. As a result, a single model replaces many steps in traditional speech processing. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models. The codebase should be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [OpenAI's tiktoken](https://github.com/openai/tiktoken) for their fast tokenizer implementation. We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models. The codebase should be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [OpenAI's tiktoken](https://github.com/openai/tiktoken) for their fast tokenizer implementation.
## Prerequisites ## Setup
### Prerequisites
* Whisper requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system. The command-line tool is available from most package managers. To install [`ffmpeg`](https://ffmpeg.org/), use one of the following commands for your operating system: * Whisper requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system. The command-line tool is available from most package managers. To install [`ffmpeg`](https://ffmpeg.org/), use one of the following commands for your operating system:
**on Ubuntu or Debian**
```bash ```bash
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg sudo apt update && sudo apt install ffmpeg
```
# on Arch Linux **on Arch Linux**
```bash
sudo pacman -S ffmpeg sudo pacman -S ffmpeg
```
# on MacOS using Homebrew (https://brew.sh/) **on MacOS using Homebrew (https://brew.sh/)**
```bash
brew install ffmpeg brew install ffmpeg
```
# on Windows using Chocolatey (https://chocolatey.org/) **on Windows using Chocolatey (https://chocolatey.org/)**
```bash
choco install ffmpeg choco install ffmpeg
```
# on Windows using Scoop (https://scoop.sh/) **on Windows using Scoop (https://scoop.sh/)**
```bash
scoop install ffmpeg scoop install ffmpeg
``` ```
* If [tiktoken](https://github.com/openai/tiktoken) does not provide a pre-built wheel for your platform, install [`rust`](http://rust-lang.org). Follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install the Rust development environment. * If [tiktoken](https://github.com/openai/tiktoken) does not provide a pre-built wheel for your platform, install [`rust`](http://rust-lang.org). Follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install the Rust development environment.
## Installation ### Installation
* You can download and install (or update to) the latest release of Whisper with the following command: * You can download and install (or update to) the latest release of Whisper with the following command:
@ -74,7 +82,8 @@ pip install git+https://github.com/openai/whisper.git
```bash ```bash
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
``` ```
## Installation troubleshooting
### Installation troubleshooting
If you see installation errors during the installation of Whisper, follow these steps: If you see installation errors during the installation of Whisper, follow these steps:
* Check if you have [`rust`](http://rust-lang.org) installed on your system. If not, follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install the Rust development environment. * Check if you have [`rust`](http://rust-lang.org) installed on your system. If not, follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install the Rust development environment.