DiT: The Secret Sauce of OpenAI's Sora & Stable Diffusion 3

preview_player
Показать описание


Diffusion Transformers might be the next meta in media synthesis. It not only excels at text-to-image, but also text-to-video shown by OpenAI's Sora. The results that have came out which uses it are mindblowingly good. If we can obtain more information about it though:( They are both currently close sourced.

DiT: Diffusion Transformers

Sora

Stable Diffusion 3

DiffiT

HDiT

This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer

[BGM] massobeats - glisten
Рекомендации по теме
Комментарии
Автор

Don't miss out on these exciting upgrades designed to elevate your content creation experience with DomoAI! Go try out: discord.gg/sPEqFUTn7n

bycloudAI
Автор

Are you telling me Stable diffusion had ADHD 💀

imerence
Автор

The hell with the Frieren thumbnail lmao

ovum
Автор

5:43, OpenAI itself said that the more compute they threw at sora, the better the results got, so you are right in that the compute here does absolutely matter!

amortalbeing
Автор

Those facebook users are probably bots, because we are reaching the dead internet theory

cybertruck
Автор

In the interview, the developers mentioned that Sora will still need quite a while until its release. In the subsequent interview, however, the chief technology officer stated that there might be expanded access this year, possibly even in the coming months. I believe this is yet another instance of the conflict between scientific caution and the realization that, if high prices are initially charged as announced, there can be significant earnings.

manuffls
Автор

Hey, great video as always. Would love to see a deep dive video into the Stable Diffusion 3 architecture and other DIT methods!

mehulagarwal
Автор

Just letting you know, Yes, I would like to see your video on DiffiT and HDiT architecture if you make one! Love your videos!!

adityajoshi
Автор

The only vid someone needs to understand current ai situation

vedforeal
Автор

The "Sigmoid Curve" is FAR too misleading a representation to actually give people any reliable indication of where we are, unless their goal is a skeptical narrative. It's impossible to tell where we are on the curve, or what the curve truly looks like at this time, as it's significantly different with every advance, and how we arbitrarily classify advances also changes the appearance of the curves. Such representations can ONLY be applied retrospectively, when analyzing the past, they have no value for reliable prediction. Maybe the curve has multiple slow spots along it, or maybe multiple sigmoid curves chain together.
Somewhere I saw an explanation, that overlayed a whole bunch of sigmoid curves related to recent technological advances, and when you average out all their overlaps, you end up with the same exponential singularity curve, guys like Kurzweil predicted. Tho I can't seem to find that.

NikoKun
Автор

Definitely want to see an indepth look at DiT

slackstation
Автор

I definitely would like for you to explore the other DiTs

jghifiversveiws
Автор

yeah it would be nice to hear more about DiT architectures

gemstone
Автор

My exact reaction when started usong if ai in comfyui

claxviith
Автор

Domo AI can get expensive very fast. Went through a month's tokens from the basic subscription in an hour transforming my client's 30 sec ad into an anime. The results were good.

chrisfleitas
Автор

Love how the ai took over even sponsors

cdkw
Автор

You reached the pick of chill in this video. I like the vibe.

DrWne
Автор

"Can you even tell which one is the real and fake image?"
- Uh, yes, it was pretty obvious. I didn't even had to 'nitpick' about it, I could tell pretty much instantly.
I completely disagree with the notion we are high up on the curve. If you actually work with AI, not just use MidJourney. I'm talking about Krita and text completion models (for example to make Agents'), you will see there is still much progress day over day.
I do generate a couple thousand of images each and every day (work related). So I do have quite some experience from seeing so many of them.

I haven't even seen an upscaler yet that satisfies my requirements. The only way to get better upscales right now is more VRAM on a single device. Which isn't practical.

My current workflow is pretty convoluted. I have multiple GPU's all rendering different image layers, each with their own focus (like attention). I have to use multiple GPU's, one for each layer to make it viable. Rendering each layer one after another on a single GPU would not be practical. - So instead of using a GPU with tons of VRAM or use a very complex workflow with multiple GPU's, I'm still seeing great progress being made in efficiency. for example SD Cascade.

I think your focus on what progress is, is too narrow. Don't just look at what is possible, you have to take in to account how much progress there is being made in what is possible with limited resources. Which isn't just the amount of GPU's either. Also the amount of, and quality of, the required input.

rakly
Автор

+1 for the follow-up video on the more technical details of DiT.

Kuchenrolle
Автор

The only thing they added was space time relation is such an understatement.

TimeLordRaps