High Fidelity Neural Audio Compression | Paper & Code Explained

Показать описание

❤️ Become The AI Epiphany Patreon ❤️

👨‍👩‍👧‍👦 Join our Discord community 👨‍👩‍👧‍👦

In this video I cover the "High Fidelity Neural Audio Compression" paper and code.

With 6 kbps they already get the same audio quality (as measured by the subjective MUSHRA metric) as mp3 at 64 kbps! 10x compression rate! This is super important as streaming video+audio makes for ~82% of total internet traffic!

Lots of ideas we've already seen in previous paper overview videos such as VQ-VAE, VQ-GAN, and AudioGen applied to the problem of audio compression.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

⌚️ Timetable:
00:00 Intro
02:37 Paper walk-through: high level overview
12:05 Residual Vector Quantization
18:05 Reducing the BW using arithmetic coding and transformers
20:05 Loss formulations and results
23:40 Code walk-through
26:00 EnCodec architecture
28:20 Residual Vector Quantizer module
32:55 Loading the audio signal
34:35 Compression - a forward pass through the encoder
38:00 Quantization forward pass
42:35 Efficiently packing the bits
45:25 Using LM to further compress audio
57:50 Outro

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️

If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!

Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#neural #audio #compression

Рекомендации по теме

Комментарии

I'm back! :)) Murphy law is real folks - I had a series of events happen to me over the past 2, 3 weeks.

Also - going forward I'll be focusing much more on practical ML projects than on papers.

TheAIEpiphany

Yo Aleksa, I just finished reading your blog post on Medium around getting into Deep Mind. Loved how detailed you went when it comes to learning the nuts and bolts of ML. Just bought the Mathemetics for ML book on your recommendation. Keep up the awesome work, you're smashing it!!

NicholasRenotte

Hi, guys. The paper mentioned two difference setup "non-streamable" and "streamable". seem like two difference CNN padding scheme? Do you know which parts of codes implement them, thanks

Nova-mtks

I wanna ask you some advice...If a person want to do research in CV...Will you recommend him to learn Classic CV first or go to reading research paper ?

convolutionalnn

Please tell me what the phrase "single multiscale spectrogram adversary" mean?

dfnyfzGfkjxrf

Please send particular code for this project

chinthangu

I'm stupid liberal arts major
so i don't understand those technical explanatiion
so i will ask u directly
1) Is encodec loseless???

2) can it replace DSD or FLAC for super high quality audio?

3) It's github explanation mentioned that' non causal 48kHz model' was deep learnt with music data.
Then
can it depict some new kind of sound which it didnt learn yet?
like conventional audio format using non neural algorithm?

musajonestagiraena

High Fidelity Neural Audio Compression | Paper & Code Explained

High Fidelity Neural Audio Compression | Paper & Code Explained

Encodec: High Fidelity Neural Audio Compression Explained

Neural Audio Codec - Encodec and SoundStream

[ICASSP2023] [Highlight] AudioDec: An Open-Source Streaming High-Fidelity Neural Audio Codec

BOTNOI AI reading group: High Fidelity Neural Audio Compression

MobileNVC: Real-Time 1080p Neural Video Compression on a Mobile Device

Audio Codecs & the AI Revolution - An Introduction to Machine Learning-Enchanced Audio Codecs

Neil Zeghidour: SoundStream: an end-to-end neural audio codec

Boosting neural video codecs by exploiting hierarchical redundancy

FLAN-T5, DreamFusion, Neural Audio Compression — Trends in AI November 2022

AMAAI Webinar - nnAudio by Cheuk Kin Wai (Raven)

In-depth Review of Google's SoundStream: An End-to-End Neural Audio Codec

Can AI Disrupt Speech Compression? | Jan Skoglund

Neural Video Compression With Diverse Contexts (CVPR 2023)

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

NSDI '24 - Gemino: Practical and Robust Neural Compression for Video Conferencing

Transformer Neural Net makes music! (JukeboxAI)

Advances in Neural Compression with Auke Wiggers - #570

Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

Secrets of Compression and Normalize in Electronic Music Production Class 6-1 #OnlineTrainingsWorld

AudioGen: Textually Guided Audio Generation

ENCODEC Figure Evolution

Secrets of Compression and Normalize in Electronic Music Production Class 6-6 #OnlineTrainingsWorld

Secrets of Compression and Normalize in Electronic Music Production Class 6-2 #OnlineTrainingsWorld