Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution – Paper Explained

preview_player
Показать описание
We've combed through the complex mathematics and dense pages of the “Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution” research paper to bring you the essential insights and key takeaways. Learn about how diffusion models can finally generate good quality text. The paper won the #ICML2024 best paper award! 👏

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael, Sunny Dhiana, Andy Ma

Outline:
00:00 Intro
01:17 Simplilearn (Sponsor)
02:29 Impossible for GPT, but possible with diffusion
04:26 Discrete Diffusion
05:33 Similarity to BERT
06:50 Forward diffusion
07:48 Backward diffusion
08:44 Score Entropy for learning
09:17 Inference: Generating a sample
09:52 Results

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Join this channel to get access to perks:
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​

Video editing: Nils Trost
Music 🎵 : Space Navigator - Sarah, The Illstrumentalist
Рекомендации по теме
Комментарии
Автор

Dr. Coffee Bean is spoiling us with two videos this month!

DerPylz
Автор

Please never stop making videos. Love your work!

dhrubajyotipaul
Автор

The content is so helpful on so many levels! The explanation is great, the curation of your selected papers is great, and your enthusiasm is also contagious to beginners and advanced ML folks! At least, IMO :)

Thank you!

borismeinardus
Автор

This was great thanks! Love the channel, keep up the good work.

GpWUrbs
Автор

Excellent explanation, you're very easy to understand with how you lay out and communicate your ideas! First time finding this channel and it's exactly the kind of content I've been looking for! Joined and looking forward to future teachings

jonbarrett
Автор

It was so cool to see you at ACL 2024! Also, awesome Video!

Schaelpy
Автор

A tutorial on your hanging sound absorbers soon? Look very nice

cipritom
Автор

6:40 So.... Can we pretrain the BERT on lots of 0%-99% of [MASK]? Oh well, can be tested locally on tinyshakespeare/tinystories I guess.

Also unless the discrete diffusion model can can learn to move or insert tokens, putting words on different positions seems sus: consider
"It was [MASK][MASK] experience" For example, if "a cool" is 2 tokens it can be inserted. But "an extravagant" can easily be more than 2 token and never can fit then. Equally. "Today morning was beautiful: [MASK][MASK]....[MASK]. Yet still" - here model will be awkward if it has 20 masks to fill but 19 is more than enough.

I guess it can learn to move items around as some form of text "inpainting"

AM-ykyd
Автор

You are absolutely amazing thanks for this explanation. I read the paper and understood nothing!

Ali-wfef
Автор

I have been wondering about diffusion in text as well. Thanks for the vid! Do you think text diffusion models will still be relevant if the LLM context windows are on the order of millions of tokens which looks like the direction they are going?

lempira
Автор

Could using quantum superposition for probability distribution in text generating diffusion models be a thing?

AhmetTungaBayrak
Автор

Hmm, I'm skeptical whether we want to be able to generate sequences in a non-sequential way. It's quite a strong prior.

Chrnalis
Автор

0:49 "It's much more coherent"
"the health benefits of alzetti's disease"
"the condition is brain"

What am I reading lol

amber
Автор

I'm not really understanding what the transformer needed / preferable 9:24 to generate that back probability. Seems like it is similarly computationally inefficient to existing transformer based LLMs? Is the S_theta predicting a probability that each word is changed? Or is there a different S_theta for each word?

nikilragav
Автор

P_t+1 and P_t are vectors, right? @8:32

nikilragav
Автор

Diffusion alone are too expensive and inefficient for training complicated, unstructured data like text. We need a DIT (Diffusion transformer) like architecture that combine the goods of both diffusion and transformers, still expensive though.

bass
Автор

Why are we settling for diffusion models when generative models are theoretically superior? Yeah, diffusion's faster, but that's just because it's parallel and we're impatient. Moor's law will eventually close the speed gap anyway. Rather than text on diffusion I'll be waiting to see images/videos made by generative models. Great video though!

duytdl
Автор

LOVE IT. Dr any idea on research in this area?

MariaM-pufx