Introducing Whisper

I love interviews. It’s a great way to get to know a person and it’s often a great way to learn. One of the most challenging aspects of interviews is capturing exactly what the interview subject had to say. I have used my mobile phone to capture a subject’s voice. I have also used Audacity. In both cases, I am left to transcribe that content into written form. Now the paradigm is changing with the advent of Whisper which is an openly licensed program developed by OpenAI. According to OpenAI’s website introducing¬†Whisper, “Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.”

It’s an amazing software and easy to install on Linux which is my daily driver. I used Pop!_OS but you can easily install Whisper on Fedora-based distributions also. You need to make sure that Python is installed and you can easily test that by entering the following command.

$python3 –version

In my case the result was

Python 3.10.6

Then install a Python virtual environment.

$ sudo apt install python3.10-venv

Next, you need to install Python pip3

$ sudo apt install python3-pip

Initialize the Python virtual environment for Whisper with

$python3 -m venv whisper

I changed into the ‘whisper’ directory with

$cd whisper

Finally, I installed ‘whisper’ with

$ pip3 install whisper

Now I am ready to use this amazing new tool to transcribe mp3 and mp4 files into easily readable text. If you don’t have any and you would like to try out Whisper you can point your web browser at Librivox and download a free book or part of one. I chose Robert Frost’s ‘Mending Wall’

I can use ‘whisper’ from the command line to convert the mending wall mp3 to text

$ whisper 04_mending_wall_frost_bc.mp3 –model base

In a little over a minute ‘whisper’ has converted the ‘mp3’ to text that can easily be read. The conversion outputs 5 files. One of them is a text file with the text of the ‘mp3’. Here are the first few lines taken from 04_mending_wall_frost_bc.mp3.

“Mending Wall by Robert Frost, read for libravox.org by Becky Crackle, November 16, 2006, Canal Winchester, Ohio. Something there is that doesn’t love a wall that sends the frozen groundswell under it and spills the upper boulders in the sun, and makes gaps even too can pass abreast. The work of hunters is another thing. I have come after them and made repair where they have left not one stone on a stone, but they would have the rabbit out of hiding to please the yelping dogs.”

As you can see the results are accurate

You can create a Python script to automate the process.

import whisper

model = whisper.load_model(“base”)
result = model.transcribe(“04_mending_wall_frost_bc.mp3”) print(result[“text”])

Using the Python script provides a much cleaner output.

“Mending Wall by Robert Frost, read for Librevox.org by Becky Crackle, November 16th, 2006, Canal Winchester, Ohio. Something there is that doesn’t love a wall that sends the frozen groundswell under it and spills the upper boulders in the sun, and makes gaps even too can pass abreast. The work of hunters is another thing. I have come after them and made repair where they have left not one stone on a stone, but they would have the rabbit out of hiding to please the yelping dogs. The gaps I mean, no one has seen them made or heard them made, but at spring-mending time we find them there. I let my neighbor know beyond the hill, and on a day we meet to walk the line and set the wall between us once again. ”

Whisper has an MIT license