How to Implement a CNN for Sound Classification

preview_player
Показать описание
Learn how to implement a deep learning (CNN) sound classifier using Pytorch and torchaudio.

Code:

===============================

Interested in hiring me as a consultant/freelancer?

Join The Sound Of AI Slack community:

Connect with Valerio on Linkedin:

Follow Valerio on Facebook:

Follow Valerio on Twitter:

===============================

Content:
0:00 Intro
0:31 Implementing CNNNetwork class
9:55 Implementing the forward method
12:43 Network summary with torchsummary
17:19 What's up next?
Рекомендации по теме
Комментарии
Автор

I have recently discovered your channel, I have been watching your videos from the very beginning, now I see you are still uploading videos and I feel so excited of all the knowledge you can give to your public. This is so nice! Could you please perform some of your projects to see the evolution of neurons, something like "before and after", what we are capable to do with your courses? Here is pure knowledge and I am so sure of your channel... Thank you very much for your content!

George.English
Автор

Thank you so much for amazing tutorials! Just a note (per PyTorch docs)... when using nn.CrossEntropyLoss() as the loss_fn, it is important to keep the model output as raw logits (ie. do not include the softmax() in the model as the final output layer). I read in a discussion that this is important to reduce the potential for numerical instabilities due to some log-sum-exp equation that is performed. This might be new to the current version of PyTorch (2.0.1).

deemo
Автор

Why do you multiply 128 by 5 and 4? Where do the latter two numbers come from?

Thanks for all the videos; they're fantastic.

ericdemattos
Автор

torchsummary is now torchinfo, if im not mistaken!

michk
Автор

8:28 In 128 * 5 * 4, where did the 5 and 4 come from?

peterkanini
Автор

Thank you for these videos! I follow step by step but I get an error in the train_single_epoch function, because targets are not tensor objects, but tuple. Why?

11 def train_single_epoch(model, data_loader, loss_fn, optimiser, device):
12 for inputs, targets in data_loader:
---> 13 inputs, targets = inputs.to(device), targets.to(device)
AttributeError: 'tuple' object has no attribute 'to'

saragiovannini
Автор

Sir please consider adding the multichannel raw files (like ULA & UCA microphone array) and process the CNNs

amruthgadag
Автор

thanks sir for this great video
just one quastion why you didn't normalize the mel freq images before feed them to vggnet as i know the values shuold be between 0 and 1

EngRiadAlmadani
Автор

I have checked everything twice, my model is running on cuda, but Accuracy is 'zero' from the beginning, can some one help me?

jaypadia
Автор

CNNs are typically used for images. Why are we using CNNs for audio and how does that work?

Dygit
Автор

I got an error:

RuntimeError: stft input and window must be on the same device but got self on cpu and window on cuda:0

how can I solve that?

cemayar
Автор

Hello, and thanks for your channel. I would like to know if I can use TSFRESH to extract features from sound files. My problem is that I do not know how to do it: I have at my disposal .OGG files (not .WAV files). Librosa can read them without a problem, and I managed to extract the features with it, but I get only 60% of precision in sound recognition with Random Forests and 55% with an ANN built from scratch. I was told that TSFRESH can extract hundreds of features from a time series, and it is true, but I would like to know how to make it work with my sound files in .OGG format

alchimiste
Автор

sir can you please upload audio classification using pytorch in Google Colab?

saleemjamali
Автор

May you cite other architectures considered "better" for audio classification? Are they always based on image processing (i.e. conv. layers, mel spectrogram)?

luigibcdefg
Автор

Thanks so much for the videos! They're super useful. I was wondering if there any significant of implementing a VGG architecture using pytorch vs tensoflow/keras? I'm relatively new to machine learning so hopefully my question makes sense

seewai
Автор

Is there a reason why you decided to use a conv2d over a conv1d? or is it a matter of preference?

stevenhoang
Автор

Great!
Can you implement a Continuous Speech Recognition app? I'm having a hard time doing it :D

rog
Автор

I love your videos. Are there any good tutorials you recommend on building CNNs in python?

bobmeyers