How to Do Speech Recognition with Arduino | Digi-Key Electronics

preview_player
Показать описание
Speech recognition is the process of using computers to recognize and understand human speech. Being able to understand full sentences or questions requires a lot of processing power, as it often relies on the complex algorithms found in natural language processing (NLP).

Most microcontrollers (and Arduino boards) cannot run NLP due to their limited resources. However, we can train a neural network to perform basic keyword spotting, which still has many uses (such as enabling a smart speaker by saying “Alexa” or shouting “stop” to halt a machine).

In this video, we will use Edge Impulse to train a neural network to identify and classify a few custom keywords. We will then deploy this trained model to an Arduino Nano 33 BLE Sense to perform keyword spotting in real time.

To begin, we collect samples of the keywords we wish to identify. These can be collected on any number of recording devices and then edited using Audacity to create 1-second snippets. We recommend collecting at least 50 samples to start.

After, we run a custom Python script that mixes the samples with random snippets of background noise and curates the custom keywords along with keywords found in the Google Speech Commands dataset.

From there, we upload our curated dataset to Edge Impulse. We use Edge Impulse as a tool to extract features from the audio samples, which are the Mel frequency cepstral coefficients (MFCCs). We then use it to train a neural network to identify our target keywords. Once done, we can test the model and download it as part of an Arduino library.

We load the library into Arduino and use it to perform inference in real time. The Arduino example code continually captures audio data, extracts features (computes MFCCs), and uses those MFCCs as inputs to the trained model. The model returns (what is essentially) the probabilities that it thinks it heard our target keywords.

We can compare those output values to thresholds to take action whenever it hears the desired keyword! To start, we’ll blink a simple LED (because who doesn’t love an overly complicated blinky program?).

Product Links:

Related Videos:
What is Edge AI?

Intro to TensorFlow Lite Part 1: Wake Word Feature Extraction

Intro to TensorFlow Lite Part 2: Speech Recognition Model Training

Intro to TensorFlow Lite Part 3: Speech Recognition on Raspberry Pi

Getting Started with TensorFlow Lite for Microcontrollers

Related Project Links:

Related Articles:

Getting Started with TensorFlow Lite for Microcontrollers -
Рекомендации по теме
Комментарии
Автор

I have come to greatly appreciate Digi’s dedication to education, not to mention how amazing teacher Shawn is! Keep up the good work and Cheers

VAAYG
Автор

With so many uncertain in test set, lower the minimum confidence rating to 0.6 to get much better results.

janjongboom
Автор

thank you! Great and timely content, great speed and information content (at 2x)

userou-igze
Автор

It'd be nice if you could show how to run an audio classifying tflite model on an Arduino Nano / Raspberry Pi Pico *using an Analog Microphone* . There's no proper video that I could find on the web that does that or even resembles this concept remotely.

rohanmanchanda
Автор

I need some help. While setting things up in Anaconda I'm getting this error:

dataset-curation.py: error: the following arguments are required: d

I really don't know what this could be and I would really apreciate any help, thanks

eonoire
Автор

This is really a good content!!! Thanks very much

harrytsai
Автор

Can this method explained in the video be used to recognize a specific sound rather than specific text, e.g., a clap sound?

adhamelrouby
Автор

Thank you dude. Very cool tutorial. Please make STM32F4 speech recognition example.

resatyigen
Автор

Can I use the data_curation for my dataset and profit out of the model I develop?

mri
Автор

Hi Shawn, Thanks for the great tutorial. I dusted off my BLE, went through every one of your steps and got it all working in near flawless fashion, six hours later. I am now in a good place to start experimenting. My keyword was 'shut-down' (only 48 samples) which turns the LED on and 'go' turns the LED off. This could be a light bulb controller or a TV power switch. A lot of sounds get confused for 'go' so increasing the threshold for 'go' to 85% worked well.
Perhaps its just my OS but I had to replace all your '\' with '/' in the curation script command line. Oddly, my feature extraction time took less that half of yours (123ms) but I had the same BLE Sense as you.
I want to try to expanding the # of wake words. What do you think held you back: processor speed, RAM, Edge Impulse, data....?
Looking forward to your next videos

bertbrecht
Автор

Thank You So Much.
I have added the file as per Edge Impulse still facing an issue with The filename or extension is too long.

Error compiling for board Arduino Nano 33 BLE.

ashwinis
Автор

I want to use this method with an ESP32, how can I make the program use the audio data coming via I2S

sureshtiwari
Автор

Hi Thanks for the tutorial. I want to use the method with an ESP32, how can I make the program use the audio data coming via I2C ?

hamishgrant
Автор

Buen dia, hay una guia como esta pero usando solo raspeberry pi pico y los canales analogicos conectados a un microfono

brayanaquino
Автор

I could of tell you a lot faster way to do the audio samples, but you already did do it so

nikonissinen
Автор

"hand me my patching trowel, boy!"

djtomoy
Автор

Is this using tflite micro in backend?

akkutyagi
Автор

why do i not have app data under my user :(

MilSimVipers
Автор

Hi! I want to convert the speech to text and then work this the text in Python would this module work for me for this?

imsteven
Автор

"I've got 68, which should work for this prototype"

Ckckck, I'm disappointed Shawn

dhupee