Sound Recognition - Computerphile

Показать описание

How do you go about making a device recognise individual sounds? Audio Analytic's Dr Chris Mitchell on how they approached the problem.

Audio Analytic is a sound recognition software company based in Cambridge UK & Palo Alto USA.

Dodgy sound effects courtesy Sean's Roland electronic drum kit...

This video was filmed and edited by Sean Riley.

Рекомендации по теме

Комментарии

My CS Thesis is on Sound Recognition. This stuff gets so abstract so quickly it gets really hard to explain to people. He did a better job than I could!

iau

3:13
Honestly, the door squeak sound effect you have is great.

Architector_

7:27 “Last time I checked my window for breaking it didn't speak to me”

You just didn't understand it.

stensoft

never thought I would hear Tyler the Creator in a computerphile video

guyman

SlipKnot music in a computerphile video? I never thought I'd see that

AmxCsifier

i would really prefer a video on speech recognition

realeques

Very nice collection of vulnerable devices in the back.

Ivo--

Summary of the whole 15min: "we have to cut the sound precisely, and feed it to the computer". Wrong title maybe.
And I'm no expert in sound recognition, but this makes it sound like most of it happens in the time domain rather than the freq domain, that sounds too doubtful to me.

anothergol

i am a hobbyist music maker, and I have kind of created my own dictionary around describing sounds. for example, "smooth, deep, dark blue, hollow, rubbery", or "smooth, sharp, narrow, high, reddish/brown", and such :-D
they would be highly insufficient for sound recognition purposes, but I would not be surprised if they were at least partially overlapping with the features they used. as far as i know, many people who make music, at least those who make it on computer so they have to think about these things consciously, have similar kinds of more or less developed dictionary to describe sound, melody, and rythm features.

MidnightSt

Great video and great explanation.
It would get tricky when you need to create a database of glass breaking. Coz you actually have to break all these different types of glass. Could get expensive very quickly.

lindascoon

What's easier for a computer, sound recognition or image recognition?

Cesariono

I was disappointed by this video. This is an interesting topic, for which there were really two viable approaches for an entertaining 15-minute video: A high-level overview of the technology this guy developed, with diagrams of the parts of a specific use case or device, "how it works" style; or the first video in a series like you've done with Dr. Pound that cover prerequisite fundamental concepts first, and build up to a more in-depth examination of an end-to-end example. This video isn't either; it's 15 minutes of a guy speaking at such a high-level of abstraction that nothing is actually explained, because he ostensibly thinks that computerphile viewers aren't sophisticated enough to understand his work, as is made evident by his condescending remark at 8:33 - 8:40. How about an example of a feature that IS useful, and some examples of what exactly the metrics are, and what sounds it can be used to differentiate?

shiphorns

Any chance we could get links to papers/blogs/projects on the subject ? I would really like to try out hacking on sound recognition especially on a low power device.

DeathTickle

A bit more detail about speech recognition would have been useful. Perhaps a future video?

Lolwutdesu

I worked on early DSP-based speech recognition used in telephony come a long never really imagined talking to your TV remote would be a common thing.

realcygnus

Will there be more videos like this? I'd love to see some more in depth stuff, I jibbed out of going to university to do this sort of thing, bags of regret around that now!

ipg

What do I see on the desk? 132 column LINE PRINTER PAPER? Does that still exist in 2017?

jlinkels

15 mins of virtually nothing and its edited version/

Dima-htrb

I would think designing a security system based on audio would be challenging as an attacker could simply jam the classifier with extremely loud, rather short duration wideband noise to temporarily raise the noise floor. I can think of a few mitigations, but I'd imagine it'd be pretty hard to design a system with audio as the only sensor.

kylegreen

it is incredibly meta to be watching this with Google-generated subtitles.

kowalityjesus

Sound Recognition - Computerphile

Sound Recognition - Computerphile

Digital Audio Compression - Computerphile

Canny Edge Detector - Computerphile

Geometric Face Recognition - Computerphile

Detecting Faces (Viola Jones Algorithm) - Computerphile

Tricking AI Image Recognition - Computerphile

Finding the Edges (Sobel Operator) - Computerphile

How NOT to Sample Audio! - Computerphile

Secrets Hidden in Images (Steganography) - Computerphile

How Voice Recognition Works

How Shazam Works (Probably!) - Computerphile

Generative Adversarial Networks (GANs) - Computerphile

Faces & the Local Binary Pattern - Computerphile

How Blurs & Filters Work - Computerphile

Deadly Truth of General AI? - Computerphile

Building the BBC Micro (The Beeb) - Computerphile

Error Detection and Flipping the Bits - Computerphile

Day in My Life as a Quantum Computing Engineer!

Digital Images - Computerphile

The Perfect Code - Computerphile

K-d Trees - Computerphile

Inside a Neural Network - Computerphile

Encoder Decoder Network - Computerphile

Is DeepFake Really All That? - Computerphile