filmov
tv
Python Tutorial : Introduction to audio data in Python
![preview_player](https://i.ytimg.com/vi/QPLYTFgTQIg/maxresdefault.jpg)
Показать описание
---
Hello and welcome to the course! My name is Daniel Bourke and I'll be your instructor. To get started, we're first going to see how speech and audio processing are different from other kinds of data processing.
Much like other data types, audio files come in many different formats, such as, mp3, wav, m4a, and flac. But each of these formats has a standard measure of frequency.
Frequency is measured in kilohertz but is also referred to as kHz or sampling rate. Much like how a movie shows 30 pictures per second which our brains register as moving pictures, the sampling rate of an audio file is a measure of the number of data chunks per second used to represent a digital sound.
With one kilohertz equaling one thousand pieces of information per second.
For example, a song you stream will usually have a 32 kHz sampling rate. This means 32,000 pieces of information per second. Speech and audiobooks are usually between 8 and 16 kHz. We'll look at some of these later.
And as you might've guessed, audio files are different from tabular or text data because you can't immediately see the data you're working with.
To get spoken language audio files into something we can see and manipulate, we first have to open the audio file with Python's built-in wave module.
We can get started with the wave module by running the command import wave.
Now, we have an audio file, good morning dot wav ready to go. It contains a person saying the words good morning.
To import it, we'll use wave's open method.
Now we've saved the good morning dot wav audio file to the variable good_morning in the format of a wave_object. However, in this state it's not very useful to us.
To manipulate it further, we'll use the readframes method to convert the wave_object to bytes. The -1 means we want to read in all of the pieces of information within the wave_object.
Now we've converted the audio file to bytes, what do they look like?
Okay, we can see a snippet of the entire soundwave in byte form.
But remember how kilohertz means thousands of pieces of information per second? The good morning dot wav audio file is 48 kilohertz and 2-seconds long. 48,000 pieces of information per second and 2-seconds long equals 96,000 chunks of data all for only two words.
So if we printed out the entire soundwave in byte form we'd see 96,000 of these combinations of letters and numbers.
Don't worry, if the output looks confusing for now, we'll learn how to convert these bytes into something more useful shortly.
Now you can start to see how working with audio and spoken language files is different from other kinds of data.
First of all, unlike text or tabular data, you can't immediately see what you're working with. So many audio files often require a conversion step before you can begin working with them.
And because of the frequency measure, even a few seconds of audio can contain large amounts of data. Add in background noise, other sounds, more speakers and the number of pieces of information grows even more. We'll look into this later on.
Alright, it's time to get hands-on and practice importing your first audio file!
#DataCamp #PythonTutorial #SpokenLanguageProcessinginPython #SpokenLanguageProcessing #audiodatainPython
Комментарии