Parti - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Paper Explained)

preview_player
Показать описание
#parti #ai #aiart

Parti is a new autoregressive text-to-image model that shows just how much scale can achieve. This model's outputs are crips, accurate, realistic, and can combine arbitrary styles, concepts, and fulfil even challenging requests.

OUTLINE:
0:00 - Introduction
2:40 - Example Outputs
6:00 - Model Architecture
17:15 - Datasets (incl. PartiPrompts)
21:45 - Experimental Results
27:00 - Picking a cherry tree
29:30 - Failure cases
33:20 - Final comments

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Рекомендации по теме
Комментарии
Автор

OUTLINE:
0:00 - Introduction
2:40 - Example Outputs
6:00 - Model Architecture
17:15 - Datasets (incl. PartiPrompts)
21:45 - Experimental Results
27:00 - Picking a cherry tree
29:30 - Failure cases
33:20 - Final comments

YannicKilcher
Автор

AI art generation has progressed extremely rapidly these last few years - it's ridiculous. 2018 generative art looked like trashed and now illustrators are, with good reason, concerned about their job security. It's been wild to witness. This has to be one of the most exciting times to read scientific papers.

Mutual_Information
Автор

In a way, it makes a lot of sense, since written language is a form of lossy compression of the things it actually represents. You can compress a picture of a cat in a tree with "cat in tree", the decompression just generates a picture of a cat in a tree, the loss is that it may be a different cat in a different tree, but it's still cool.

DrSulikSquirrel
Автор

Another breakthrough in image generation in such short order? I can't believe it!

XOPOIIIO
Автор

Failure cases are appreciated :)

It's like a sandwich, with examples of what it finds easy, hard, and too hard.

alanhere
Автор

"Failure: improper handling of negation or indication of absence"

Yeah they all have this problem, it'd be amazing to see that fixed :)

alanhere
Автор

As always, this is merely building on pioneering work done by Schmidhuber in his early childhood

Michsel
Автор

Always bet on scale! I hope people are starting to understand this, I'd imagine future models to be even more conceptually simple while simultaneously far more general too.

mgostIH
Автор

That was awesome! Normally I do not like the second, interview videos, but if you could score an interview (something informal) with the major contributors of this paper, that would be a must watch for sure.

AndrewRafas
Автор

And we're suddenly back to mainframe days where you need a computer the size of a room to run some program.
(that's not a new sudden thing, but I wish stuff like this fit on a GPU)

..
Автор

Imagen and Parti both wisely steer well clear of the Uncanny Valley and stick to animals or, when people are involved, they're in the shadows or in space suits, etc

MrJaggy
Автор

The upscaler is cool as well if it can go from 256x256 to 1024x1024 since that is 4x4=16 times magnification, not 4 times magnification

ilikenicethings
Автор

That is absolutely amazing!
Keep up the great work, Yannic!
I love your content.
Your channel is my main source of ML news!

andregn
Автор

Growing the cherry tree, reminds me of Art Breeder by Kenneth stanley. greatness cannot be planned.

tapashyapanday
Автор

Wow the effect of scale is undeniable at this point. Does this mean a bunch of companies with vast compute resources hold key to squeezing innovation out of the current architectures? Must be hard to be independent researchers these days without them sweet datacenters

bharcooldude
Автор

This is exciting. Will definitely look into this more

ChocolateMilkCultLeader
Автор

It's like the smaller models are drunk and they sober up with scale. XD EDIT: The Porsche 356 in the 20b parameter model is the actual one. I would like to see many generated images of that car just to figure out if the model is memorizing the image verbatim or if it's making up a proper novel rendition of the object. EDIT2: The cap with the word "bonerz" written on it would be a success. XD

DamianReloaded
Автор

Hmm, ViT-VQGAN does not seem to be specifically optimized for autoregressive processing - i.e. ordering does not seem to matter in ViT.

OTOH for autoregressive generation ordering matters a lot: The first token to be generated uses the least amount of computation (i.e. it cannot query other image tokens) while the last one uses the most.

So, could they improve results by making ViT-VQGAN part more autoregression-friendly?

killers
Автор

@yannic: just noticed you have too much bass in your audio. hearing it on a fullrange speaker system. some low cut <120hz in your signal chain would be nice...

neuron
Автор

10:01 to 10:40

So you're saying that one of those squares of the images, lets say it depicts half a stoats face where the face overlaps the edge of the square, some sand, and half a tamagotchi made of wood, is then reduced to a single token (like a word in a sentence) and it only has a dictionary of 8k words?

alanhere