Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution – Paper Explained

Показать описание

We've combed through the complex mathematics and dense pages of the “Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution” research paper to bring you the essential insights and key takeaways. Learn about how diffusion models can finally generate good quality text. The paper won the #ICML2024 best paper award! 👏

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael, Sunny Dhiana, Andy Ma

Outline:
00:00 Intro
01:17 Simplilearn (Sponsor)
02:29 Impossible for GPT, but possible with diffusion
04:26 Discrete Diffusion
05:33 Similarity to BERT
06:50 Forward diffusion
07:48 Backward diffusion
08:44 Score Entropy for learning
09:17 Inference: Generating a sample
09:52 Results

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Join this channel to get access to perks:
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research

Video editing: Nils Trost
Music 🎵 : Space Navigator - Sarah, The Illstrumentalist

Рекомендации по теме

Комментарии

Dr. Coffee Bean is spoiling us with two videos this month!

DerPylz

Please never stop making videos. Love your work!

dhrubajyotipaul

The content is so helpful on so many levels! The explanation is great, the curation of your selected papers is great, and your enthusiasm is also contagious to beginners and advanced ML folks! At least, IMO :)

Thank you!

borismeinardus

This was great thanks! Love the channel, keep up the good work.

GpWUrbs

Excellent explanation, you're very easy to understand with how you lay out and communicate your ideas! First time finding this channel and it's exactly the kind of content I've been looking for! Joined and looking forward to future teachings

jonbarrett

It was so cool to see you at ACL 2024! Also, awesome Video!

Schaelpy

A tutorial on your hanging sound absorbers soon? Look very nice

cipritom

6:40 So.... Can we pretrain the BERT on lots of 0%-99% of [MASK]? Oh well, can be tested locally on tinyshakespeare/tinystories I guess.

Also unless the discrete diffusion model can can learn to move or insert tokens, putting words on different positions seems sus: consider
"It was [MASK][MASK] experience" For example, if "a cool" is 2 tokens it can be inserted. But "an extravagant" can easily be more than 2 token and never can fit then. Equally. "Today morning was beautiful: [MASK][MASK]....[MASK]. Yet still" - here model will be awkward if it has 20 masks to fill but 19 is more than enough.

I guess it can learn to move items around as some form of text "inpainting"

AM-ykyd

You are absolutely amazing thanks for this explanation. I read the paper and understood nothing!

Ali-wfef

I have been wondering about diffusion in text as well. Thanks for the vid! Do you think text diffusion models will still be relevant if the LLM context windows are on the order of millions of tokens which looks like the direction they are going?

lempira

Could using quantum superposition for probability distribution in text generating diffusion models be a thing?

AhmetTungaBayrak

Hmm, I'm skeptical whether we want to be able to generate sequences in a non-sequential way. It's quite a strong prior.

Chrnalis

0:49 "It's much more coherent"
"the health benefits of alzetti's disease"
"the condition is brain"

What am I reading lol

amber

I'm not really understanding what the transformer needed / preferable 9:24 to generate that back probability. Seems like it is similarly computationally inefficient to existing transformer based LLMs? Is the S_theta predicting a probability that each word is changed? Or is there a different S_theta for each word?

nikilragav

P_t+1 and P_t are vectors, right? @8:32

nikilragav

Diffusion alone are too expensive and inefficient for training complicated, unstructured data like text. We need a DIT (Diffusion transformer) like architecture that combine the goods of both diffusion and transformers, still expensive though.

bass

Why are we settling for diffusion models when generative models are theoretically superior? Yeah, diffusion's faster, but that's just because it's parallel and we're impatient. Moor's law will eventually close the speed gap anyway. Rather than text on diffusion I'll be waiting to see images/videos made by generative models. Great video though!

duytdl

LOVE IT. Dr any idea on research in this area?

MariaM-pufx

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution – Paper Explained

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution – Paper Explained

Discrete diffusion modeling by estimating the ratios of the data distribution

How Diffusion Works for Text

How To Train an LLM With Diffusion From Scratch

IQ TEST

Discrete diffusion models for generative protein design

What is Monte Carlo Simulation?

Diffusion and Score-Based Generative Models

What are Diffusion Models?

Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces

Diffusion Models | Paper Explanation | Math Explained

What are Normalizing Flows?

Simple Diffusion Language Models

Diffusion Models From Scratch | Score-Based Generative Models Explained | Math Explained

Estimating ACT-R declarative memory parameters using a drift diffusion model

This chapter closes now, for the next one to begin. 🥂✨.#iitbombay #convocation

Day in My Life as a Quantum Computing Engineer!

Protein Design with Guided Discrete Diffusion

DiGress: Discrete Denoising Diffusion for Graph Generation | Clément Vignac

【EP5】Bit Diffusion: Generating Discrete Data using Diffusion Models with Analog Bits

JEE Aspirants ka Sach 💔 #JEE #JEEMain #Shorts

LCM: The Ultimate Evolution of AI?

CS 198-126: Lecture 12 - Diffusion Models

Diffusion Models for Image Generation and Speech Synthesis