Transcribe Voice Audio to Text with AI on your Windows or Linux PC - OpenAI Whisper Tutorial

preview_player
Показать описание
Hi everyone! This video covers...
• OpenAI Whisper, FREE powerful AI-driven speech/audio to text.
• How to create searchable text files from your audio and video clips.
• 100% local transcription on your PC. No Internet required after AI model is downloaded.
• How to install/use on Windows and Linux (Ubuntu 22.04 LTS used as an example).
• Install for either CPU and GPU-driven AI.
• Compare performance of CPU vs GPU-driven OpenAI Whisper transcription.
• How to automatically transcribe entire directories of video and audio files to text (i.e., .mp4, .mov, .wav, mp3, and many other files).
• Transcribe audio/speech to text using an AMD GPU with ROCm.
• Transcribe audio/speech to text using an AMD GPU with ROCm within a docker container.
• Transcribe audio/speech to text using an NVIDIA GPU.

Look for "Walkthrough" in the below table of contents to find the start of this video's 7 main sections...

Clickable Table of Contents
00:00 Start
00:30 Overview
05:55 Test Clips Overview
06:20 Test Clip #1 Demo
06:45 Test Clip #2 Demo
07:21 Test Clip #3 Demo
07:42 Test Clip #4 Demo (aka the ""long"" clip)
08:55 Walkthrough #1: Windows simple setup.
09:46 Windows ffmpeg setup
10:22 Create Python environment on WIndows
10:50 Activate Python enviornment on Windows
11:06 Install OpenAI-Whisper on Windows
11:18 Check Whisper setup on Windows
11:38 Transcribe test clip #1 on Windows
12:13 Examine transcription output
12:53 Transcribe directory tree on Windows
13:15 Transcribe dir PowerShell script
15:01 Walkthrough #2: Linux simple setup.
15:13 Check Whisper setup on Linux
15:25 Transcribe clip #1 on Linux
15:57 Model download on first use
16:17 Transcribe directory tree on Linux
16:38 Transcribe dir bash script
17:30 Run bash script, transcribe dir
17:49 Open .srt file
18:14 Walkthrough #3: Linux/ROCm/Docker/Whisper setup.
18:26 Docker/Ubuntu (Debian) setup
19:37 docker image ls permission issue
20:03 docker ""hello world"" test
20:16 Examine downloaded image
20:27 docker container ls
20:41 ROCm/Pytorch docker container docs/setup
21:37 AMD/ROCm Docker Hub
21:50 AMD ROCm/Pytorch Container
22:01 docker pull rocm/pytorch
22:29 docker rocm/pytorch command line
22:47 docker run rocm/pytorch
23:21 prep container for openai-whisper
23:33 rocminfo command
23:46 setup ffmpeg (rocm/pytorch container)
23:58 openai-whisper setup (rocm/pytorch container)
24:09 Verify container Pytorch is using GPU
24:24 Run whisper in container
24:36 List stopped containers
25:07 Restarting prepped container
25:37 Sharing files with a container
26:51 Prep container for openai-whisper #2
27:21 Transcribe shared file within container
28:28 Examine transcribed output (container shared folder)
29:27 Creating openai-whisper image (commit container)
30:25 Testing our whisper docker image
31:32 Transcribe in whisper docker image (no model download required)
32:08 Transcribe with image in one step
33:13 Examine output from host
33:29 Docker/Whisper recap #1
35:11 Transcribe directory with docker image
36:05 Transcribe dir bash script, docker
40:56 run bash script, transcribe dir, docker
41:57 docker container transcription recap
42:26 Walkthrough #4: Linux/ROCm/Pytorch/Whisper native setup.
42:40 VS Code setup
43:30 ROCm/Pytorch native setup
44:36 ROCm GPG key setup
45:01 ROCm repo setup
45:57 Ubuntu 22.04 quick install copy/paste
46:44 Native rocm/pytorch setup
47:47 Verify Pytorch is using the GPU
48:02 Install openai-whisper, native
48:31 Transcribe clip #1, native
49:08 Adding language
49:21 Examine transcription output, native setup
49:46 Using wildcard, multiple clips at once
50:20 Walkthrough #5: Linux/ROCm/Pytorch/Whisper CPU vs GPU perf test.
51:02 Verify GPU is in use, ROCm GPU vs CPU
51:18 time transcription using GPU
52:36 Whisper ROCm GPU result
52:48 Remove ROCm, prep for CPU perf run
53:26 Setup CPU Pytorch
54:24 Verify Pytorch is using CPU
54:45 time transcription using CPU, ROCm GPU vs CPU
55:04 ROCm GPU vs CPU results
56:38 Walkthrough #6: Windows/CUDA/Pytorch/Whisper CPU vs GPU perf test.
56:54 Setup Python on Windows
57:28 Create Python environment
59:20 Activate Python environment
59:58 Install openai-whisper
01:00:31 Verify Pytorch is using CPU, not GPU
01:01:23 Initial ""burn in"" transcription
01:01:44 Burn-in pre-test run
01:03:21 Examine transcription output, srt file with time stamps
01:04:08 Time CPU transcription with PowerShell Measure-Command
01:06:23 Whisper CPU result
01:06:54 Walkthrough #7: Windows/CUDA/Pytorch native setup.
01:09:19 Verify Pytorch using NVIDIA GPU
01:10:10 CUDA GPU vs CPU results
01:10:30 CPU and Python 3.11, IMPORTANT
01:12:28 Outro

Errata:
1. At 33:57, the "--rm" should have been used but was not. See 38:06 for an example use of the "--rm" switch.

Tutorial source code:

Buy Me a Coffee

Subscribe to the the RicochetTech email list:
Рекомендации по теме
Комментарии
Автор

Thank you for the tutorial!
I wonder how much time it would take to create subtitles for a movie or a tv series episode? And how good of a job it would have done in that case?

quodpipax
Автор

Subscribed. I've spent 3 weeks trying to get this crap to work, and I guess Docker Desktop is just not going to work. Docker Engine on the other hand runs fine. My main issue was that Docker Desktop couldn't find the /dev/kfd device

Kaleidoveritas
Автор

Doesnt my PC already turn speech into text? Im pretty sure they all do. Anything made by Samsung does it if you want it or not so Im sure windows and android do as well. Because If one company commits a crime, they all feel they have to commit the same crime. Apple records everything including eye movement and analyze every thing and can force people to throw away the phone they love so much and its almost new and they know how to get people to make these phones for free. They might need something to eat but that doesnt stop them from showing up to work. They just wanna help Apple and not take from their record profits

richiebricker