Stable Diffusion 3

preview_player
Показать описание
Like 👍. Comment 💬. Subscribe 🟥.

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Рекомендации по теме
Комментарии
Автор

It's just crazy to see Hu-po understand all the concepts in this paper; what an insane guy!

wzyjoseph
Автор

parti prompts are all about "two dogs on the left the dog on the left is black and the one on the right is white, and a cat on the right holding up its right paw, with 12 squares on the carpet and a triangle on the wall"

fast_harmonic_psychedelic
Автор

super interesting to listen to someone with actuall understanding of how all that magic works ;)!

HolidayAtHome
Автор

Great explanations, i can't wait to test the multimodals inputs

bause
Автор

broo this is such an informative video man. kudos to you on making such complicated equations so easy and intuitive to understand for beginners

siddharthmenon
Автор

Thank you, helps a lot!! Next the SD3-Turbo Paper please :)

erikhommen
Автор

The VAE scaling isn't new, it was shown in the EMU paper. One thing neither paper discusses, is that there's an issue with scaling the channels - they have diminishing returns in terms of information density. For example, with d=8, if you sort the channels by PCA variance, the first 4 channels have the most information, then the next 2 have high frequency detail, and the last two are mostly noisy. There's still valid information in that "noise", but it may not be worth the increased computational complexity. Alternatively, this could be a property of the KL regularization where it focuses information density to few channels.
The idea of shifting the timestep distribution was proposed in the UViT paper (Simple Diffusion), I'm surprised they did not reference it directly. Although, the UViT paper provided a theoretical perspective, which does not necessarily align with human preferences.
I wish they had done a more comprehensive search with the diffusion methods... It's missing some of the other training objectives (likely due to author bias and not compute), which means it's not quite as useful as claimed.

hjups
Автор

Talking about signal/noise ratio on a mic input that is clipping. Nice 😂

MultiMam
Автор

Intel does have a big investment.
Personally, I think they should sell the model.
Keep it open source, but sell rights to use the high end models.
That way, they have a solid business plan.

jeffg
Автор

Thats not what parti prompts are for. its not about visually pleasing images. its about accurately captioned images

fast_harmonic_psychedelic
Автор

great video. do you know if rectified flow is in diffusers?

chickenp
Автор

What pdf reader do you use for annotation?

KunalSwami
Автор

How many convergence points does the vector field have?

jkpesonen