Simple Diffusion Language Models

Показать описание

Short tutorial on text diffusion.

* Simplified and Generalized Masked Diffusion for Discrete Data

Errata:

* 7:32,: I say q is ‘denoising’ but I meant ‘noising.’
* 9:16 - 10:03: There’s a term missing in the loss. See the paper for the full version which uses a slightly different notation.

Sasha Rush 🤗

Рекомендации по теме

Комментарии

@srush_nlp Great explanation! How do you think discrete diffusion models should be modified to enable long context sequence generation comparable to LLMs?

ASarkar-ML

Really cool stuff! It’s a shame it’s not quite at the level of auto regressive models (especially for DNA), but I’m excited about future work in the field. Love the explanation, it made reading the paper much more digestible

sarthak-ti

I made a Retrieval Base chatbot from scratch, but I'm not a professional, but the main component was compressing the vocabulary with synonyms, training the model on the compressed vocabulary to make it grok faster. I have a feeling that approach would allow for very small and intelligent models. What do you think about compressing the vocabulary?

MagusArtStudios

Why do this by masking and unmasking whole tokens or words? Why not pretrain some kind of latent space for each token/word and then do the diffusion in the latent space? Then the diffusion becomes much simpler. Of course you still need to convert from the latent space into the best token/word after that, but that should be relatively straightforward as well.

jrkirby

Making this process discrete seems very strange to me. Why not noise the token embeddings themselves (e.g. at pure noise levels a given token embedding is made up of the embeddings of all tokens, at zero noise it is made up of a one-hot vector like normal). And as you do diffusion you can update this token-embedding probability space since you have the logits.

After n inference steps you will probably end up with tokens that probably don't converge to a single token but instead map to some subset of tokens that should all be roughly equivalent in semantic space, so you can just randomly sample from said distribution based on the final logits. Tokens you've already generated will be one-hot, noised tokens will be blended as described.

marinepower

Really good video. I have to improve on my math, but i get the general idea. Will try to implement the idea

john_olu

Really liked it! This could work on ARC better.

wwkk

Simple Diffusion Language Models

Simple Diffusion Language Models

Diffusion models explained in 4-difficulty levels

Why Does Diffusion Work Better than Auto-Regression?

Simple and Effective Masked Diffusion Language Models | Audio Reading

NeurIPS Chats — Diffusion Models for Language

Diffusion Language Models Are Versatile Protein Learners

Language models beat diffusion? | AI video generation | ICML best paper 2024

What are Diffusion Models?

MIAI Distinguished Lecture : Inherent Interpretability for Deep Learning in Computer Vision

How Diffusion Models Work? #diffusionwithav #learnwithav #generativeai #genai #diffusion

How I Understand Diffusion Models

Introduction to Diffusion Models #diffusionwithav #learnwithav #generativeai #genai #diffusion

Learn The Basics Of Crafting Images And Stable Diffusion In Just 60 Seconds! 🚀

OpenAI CLIP: ConnectingText and Images (Paper Explained)

Testing Stable Diffusion inpainting on video footage #shorts

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution – Paper Explained

What are Transformers (Machine Learning Model)?

[QA] Diffusion Guided Language Modeling

How to use VAE and Lora's in Easy Diffusion

How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

Stable Diffusion Model Explained #shorts

Diffusion Guided Language Modeling

Diffusion Models | Paper Explanation | Math Explained

Transformers, explained: Understand the model behind GPT, BERT, and T5