Drop ffmpeg-python dependency and call ffmpeg directly. (#1242)

* Drop ffmpeg-python dependency and call ffmpeg directly.

The last ffmpeg-python module release was in 2019[1], upstream seem to be
unavailable[2] and the project development seem to have stagnated[3].  As
the features it provide is trivial to replace using the Python native
subprocess module, drop the dependency.

 [1] <URL: https://github.com/kkroening/ffmpeg-python/tags >
 [2] <URL: https://github.com/kkroening/ffmpeg-python/issues/760 >
 [3] <URL: https://openhub.net/p/ffmpeg-python >

* Rewrote to use subprocess.run() instead of subprocess.Popen().

* formatting changes

* formatting update

* isort fix

* Error checking

* isort 🤦🏻

* flake8 fix

* minor spelling changes

---------

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
This commit is contained in:
petterreinholdtsen 2023-05-04 19:53:59 +02:00 committed by GitHub
parent e69930cb9c
commit 8035e9ef48
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 20 additions and 13 deletions

View File

@ -17,9 +17,7 @@ A Transformer sequence-to-sequence model is trained on various speech processing
## Setup ## Setup
We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [OpenAI's tiktoken](https://github.com/openai/tiktoken) for their fast tokenizer implementation. You can download and install (or update to) the latest release of Whisper with the following command:
We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [OpenAI's tiktoken](https://github.com/openai/tiktoken) for their fast tokenizer implementation and [ffmpeg-python](https://github.com/kkroening/ffmpeg-python) for reading audio files. You can download and install (or update to) the latest release of Whisper with the following command:
pip install -U openai-whisper pip install -U openai-whisper

View File

@ -4,4 +4,3 @@ torch
tqdm tqdm
more-itertools more-itertools
tiktoken==0.3.3 tiktoken==0.3.3
ffmpeg-python==0.2.0

View File

@ -1,8 +1,8 @@
import os import os
from functools import lru_cache from functools import lru_cache
from subprocess import CalledProcessError, run
from typing import Optional, Union from typing import Optional, Union
import ffmpeg
import numpy as np import numpy as np
import torch import torch
import torch.nn.functional as F import torch.nn.functional as F
@ -39,15 +39,25 @@ def load_audio(file: str, sr: int = SAMPLE_RATE):
------- -------
A NumPy array containing the audio waveform, in float32 dtype. A NumPy array containing the audio waveform, in float32 dtype.
""" """
# This launches a subprocess to decode audio while down-mixing
# and resampling as necessary. Requires the ffmpeg CLI in PATH.
# fmt: off
cmd = [
"ffmpeg",
"-nostdin",
"-threads", "0",
"-i", file,
"-f", "s16le",
"-ac", "1",
"-acodec", "pcm_s16le",
"-ar", str(sr),
"-"
]
# fmt: on
try: try:
# This launches a subprocess to decode audio while down-mixing and resampling as necessary. out = run(cmd, capture_output=True, check=True).stdout
# Requires the ffmpeg CLI and `ffmpeg-python` package to be installed. except CalledProcessError as e:
out, _ = (
ffmpeg.input(file, threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr)
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
)
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0 return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0