filmov
tv
DAFx17 Keynote 2: Avery Wang - Robust Indexing and Search
Показать описание
Presented at the 20th International Conference on Digital Audio Effects (DAFx17)
Tuesday 5th September 2017, Edinburgh
Tutorial Abstract:
In this talk I will give an overview of the Shazam audio recognition technology. The Shazam service takes a query comprised of a short sample of ambient audio (as little as 2 seconds) from a microphone and searches a massive database of recordings comprising more than 40 million soundtracks. The query may be degraded with significant additive noise (less than 0 dB SNR), environmental acoustics, as well as nonlinear distortions. The computational scaling is such that a query may cost as little as a millisecond of processing time. Previous algorithms could index hundreds of items, required seconds of processing time, and were less tolerant to noise and distortion by 20-30 dB SNR. In aggregate, the Shazam algorithm represents a leap of more than a factor of 1E+10 in efficiency over prior art. I will discuss the various innovations leading to this result.
Speaker Bio:
Avery Wang is co-founder and Chief Scientist at Shazam Entertainment, and principal inventor of the Shazam search algorithm. He holds BS and MS degrees in Mathematics and MS and PhD degrees in Electrical Engineering, all from Stanford University. As a graduate student he received an NSF Graduate Fellowship to study computational neuroscience. He also received a Fulbright Scholarship to study at the Institut für Neuroinformatik at the Ruhr-Universität Bochum under Christoph von der Malsburg, focusing on auditory perception and the cocktail party effect. Upon returning to Stanford, he studied under Julius O. Smith, III at CCRMA, with a thesis titled "Instantaneous and Frequency-Warped Signal Processing Techniques for Auditory Source Separation”. He was about to do a post-doc at UCSF in auditory neuroscience when he was recruited by Chromatic Research working on high-performance multimedia DSP algorithms and hardware. He has over 40 issued patents.
Tuesday 5th September 2017, Edinburgh
Tutorial Abstract:
In this talk I will give an overview of the Shazam audio recognition technology. The Shazam service takes a query comprised of a short sample of ambient audio (as little as 2 seconds) from a microphone and searches a massive database of recordings comprising more than 40 million soundtracks. The query may be degraded with significant additive noise (less than 0 dB SNR), environmental acoustics, as well as nonlinear distortions. The computational scaling is such that a query may cost as little as a millisecond of processing time. Previous algorithms could index hundreds of items, required seconds of processing time, and were less tolerant to noise and distortion by 20-30 dB SNR. In aggregate, the Shazam algorithm represents a leap of more than a factor of 1E+10 in efficiency over prior art. I will discuss the various innovations leading to this result.
Speaker Bio:
Avery Wang is co-founder and Chief Scientist at Shazam Entertainment, and principal inventor of the Shazam search algorithm. He holds BS and MS degrees in Mathematics and MS and PhD degrees in Electrical Engineering, all from Stanford University. As a graduate student he received an NSF Graduate Fellowship to study computational neuroscience. He also received a Fulbright Scholarship to study at the Institut für Neuroinformatik at the Ruhr-Universität Bochum under Christoph von der Malsburg, focusing on auditory perception and the cocktail party effect. Upon returning to Stanford, he studied under Julius O. Smith, III at CCRMA, with a thesis titled "Instantaneous and Frequency-Warped Signal Processing Techniques for Auditory Source Separation”. He was about to do a post-doc at UCSF in auditory neuroscience when he was recruited by Chromatic Research working on high-performance multimedia DSP algorithms and hardware. He has over 40 issued patents.
Комментарии