Merge pull request #2 from KomunikacjaTechnicznaVistula/main

Improve structure
2025-11-26 15:35:57 +00:00 · 2025-01-15 22:10:30 +01:00 · 2025-01-15 22:10:30 +01:00 · 6d2c1f1ab3
commit 6d2c1f1ab3
parent f2a5241af3 a2905a0292
1 changed files with 27 additions and 18 deletions
--- a/README.md
+++ b/README.md
@ -8,11 +8,11 @@
 ## Contents <!-- omit in toc -->
- [Introduction](#introduction)
+- [What is Whisper](#what-is-whisper)
- [Approach](#approach)
+- [Setup](#setup)
- [Prerequisites](#prerequisites)
+  - [Prerequsites](#prerequisites)
- [Installation](#installation)
+  - [Installation](#installation)
- [Installation troubleshooting](#installation-troubleshooting)
+  - [Installation troubleshooting](#installation-troubleshooting)
 - [Available models and languages](#available-models-and-languages)
 - [Performance](#performance)
 - [Command-line usage](#command-line-usage)
@ -20,42 +20,50 @@
 - [More examples](#more-examples)
 - [License](#license)
-
+## What is Whisper
 ## Introduction
 Whisper is a multilingual speech recognition model for general purposes, including speech translation and language identification. Whisper is trained on a large dataset of diverse audio.
 ## Approach
 ![Approach](https://raw.githubusercontent.com/openai/whisper/main/approach.png)
 A Transformer sequence-to-sequence model is trained on various speech processing tasks. The tasks include multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder. As a result, a single model replaces many steps in traditional speech processing. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
 We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models. The codebase should be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [OpenAI's tiktoken](https://github.com/openai/tiktoken) for their fast tokenizer implementation.
-## Prerequisites
+## Setup
 ### Prerequisites
 * Whisper requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system. The command-line tool is available from most package managers. To install [`ffmpeg`](https://ffmpeg.org/), use one of the following commands for your operating system:
 **on Ubuntu or Debian**
 ```bash
 # on Ubuntu or Debian
 sudo apt update && sudo apt install ffmpeg
 ```
-# on Arch Linux
+**on Arch Linux**
 ```bash
 sudo pacman -S ffmpeg
 ```
-# on MacOS using Homebrew (https://brew.sh/)
+**on MacOS using Homebrew (https://brew.sh/)**
 ```bash
 brew install ffmpeg
 ```
-# on Windows using Chocolatey (https://chocolatey.org/)
+**on Windows using Chocolatey (https://chocolatey.org/)**
 ```bash
 choco install ffmpeg
 ```
-# on Windows using Scoop (https://scoop.sh/)
+**on Windows using Scoop (https://scoop.sh/)**
 ```bash
 scoop install ffmpeg
 ```
 * If [tiktoken](https://github.com/openai/tiktoken) does not provide a pre-built wheel for your platform, install [`rust`](http://rust-lang.org). Follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install the Rust development environment. 
-## Installation
+### Installation
 * You can download and install (or update to) the latest release of Whisper with the following command:
@ -74,7 +82,8 @@ pip install git+https://github.com/openai/whisper.git
 ```bash
 pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
 ```
-## Installation troubleshooting
+
 ### Installation troubleshooting
 If you see installation errors during the installation of Whisper, follow these steps:
 * Check if you have [`rust`](http://rust-lang.org) installed on your system. If not, follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install the Rust development environment.