High Fidelity Neural Audio Compression | Paper & Code Explained

preview_player
Показать описание
❤️ Become The AI Epiphany Patreon ❤️

👨‍👩‍👧‍👦 Join our Discord community 👨‍👩‍👧‍👦

In this video I cover the "High Fidelity Neural Audio Compression" paper and code.

With 6 kbps they already get the same audio quality (as measured by the subjective MUSHRA metric) as mp3 at 64 kbps! 10x compression rate! This is super important as streaming video+audio makes for ~82% of total internet traffic!

Lots of ideas we've already seen in previous paper overview videos such as VQ-VAE, VQ-GAN, and AudioGen applied to the problem of audio compression.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

⌚️ Timetable:
00:00 Intro
02:37 Paper walk-through: high level overview
12:05 Residual Vector Quantization
18:05 Reducing the BW using arithmetic coding and transformers
20:05 Loss formulations and results
23:40 Code walk-through
26:00 EnCodec architecture
28:20 Residual Vector Quantizer module
32:55 Loading the audio signal
34:35 Compression - a forward pass through the encoder
38:00 Quantization forward pass
42:35 Efficiently packing the bits
45:25 Using LM to further compress audio
57:50 Outro

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️

If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!

Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#neural #audio #compression
Рекомендации по теме
Комментарии
Автор

I'm back! :)) Murphy law is real folks - I had a series of events happen to me over the past 2, 3 weeks.

Also - going forward I'll be focusing much more on practical ML projects than on papers.

TheAIEpiphany
Автор

Yo Aleksa, I just finished reading your blog post on Medium around getting into Deep Mind. Loved how detailed you went when it comes to learning the nuts and bolts of ML. Just bought the Mathemetics for ML book on your recommendation. Keep up the awesome work, you're smashing it!!

NicholasRenotte
Автор

Hi, guys. The paper mentioned two difference setup "non-streamable" and "streamable". seem like two difference CNN padding scheme? Do you know which parts of codes implement them, thanks

Nova-mtks
Автор

I wanna ask you some advice...If a person want to do research in CV...Will you recommend him to learn Classic CV first or go to reading research paper ?

convolutionalnn
Автор

Please tell me what the phrase "single multiscale spectrogram adversary" mean?

dfnyfzGfkjxrf
Автор

Please send particular code for this project

chinthangu
Автор

I'm stupid liberal arts major
so i don't understand those technical explanatiion
so i will ask u directly
1) Is encodec loseless???

2) can it replace DSD or FLAC for super high quality audio?

3) It's github explanation mentioned that' non causal 48kHz model' was deep learnt with music data.
Then
can it depict some new kind of sound which it didnt learn yet?
like conventional audio format using non neural algorithm?

musajonestagiraena