Text-to-video models explained

Показать описание

When it comes to text-to-video models, the way they create clips is very similar to the way text-to-image models create images from simple text prompts. In this episode of Hidden Layers, we take a look at how these models operate under the hood - understanding how it uses Temporal Super Resolution and Spatial Super Resolution (SSR) models to create high resolution videos from frames of images. Moreover, you’ll learn how text-to-video models - like Imagen- are
an orchestration of various models working together to produce high resolution videos from a single image and text prompt.

Resources:

Chapters:
0:00 - Intro
0:16 - What are text-to-video models?
0:34 - How do text-to-video models create videos?
1:40 - What are the complexities of modeling video?
2:04 - How do we get high-resolution videos from text-to-video models like Imagen?
3:38 - Recap of how Imagen works
3:47 - Leave us questions in the comments!

Рекомендации по теме

Комментарии

Awesome explanation- just the right lenght and good abstraction. The host did a really great job. I am subscribed now!

stefan-bayer

Thank you Laurence for making this accessible to dummies like me

sabaokangan

I would love to dive deeper into this to learn how it works!

asatorftw

The second episode of Hidden Layers, “Text to video models explained, ” maintains the same high standard as the first episode. Many thanks once again to Laurence Moroney and Google Research!

Any chance we could cover Google’s LaMDA next? Perhaps there is another breakthrough conversation model you might touch upon as well. The whole idea of RLHF (Reinforcement Learning from Human Feedback) would be a great topic to dive into.

kevinbuehler

Awesome crisp explanation ... even suitable for a high schooler & AI beginners

neelfun

Wonderful video! But why the orchestration order is two space then two time, instead of one space then one time?

xidchen

Awesome! I'm reacting to this live. I feel that these 2 hidden layer videos now beg the question: have we tried the auto regressive approach for text-to-video?

sotasearcher

REALLY AWESOME
almost Unbelievable

💯💯💯

SMASH_REVIEWS

cool supper resolution models also trained with text or just labels?

tomoki-vo

That's pretty cool, though the last few models of the upscaling and time lengthening sound very inefficient
Like, it would be much better to have a single model that upscales the video to resolution X×Y @ Z FPS

avi

What’s confusing is how do you get an image that is sensible when you denoise the image. That part I don’t quite see.

wryltxw

Does no one know the difference between "amount" and "number" anymore?

scottmiller

Text-to-video models explained

Text-to-video models explained

Text-to-Video Generation using a Generative AI Model

Easy Text-to-Video in Python | Python Tutorial with Damo-vilab Model

Google Text to Video AI Imagen Video HD Video with Diffusion Models Text to Video Generator

Introducing Sora — OpenAI’s text-to-video model

NEW AI ANIMATION | FREE & UNLIMITED | AI Video Generator | Different styles | Text to Video |

HIGH QUALITY TEXT TO VIDEO NVIDIA Video LDM AI MODEL

Text to Image AI Models: Different methodologies and different models, how it works?

Hailuo AI New Video Generator | Create Cinematic AI Video Stories for Free | Text to Video Tutorial

Mora: BEST Sora Alternative - Text-To-Video AI Model!

So you think you know Text to Video Diffusion models?

New AI Text-to-Video Model powered by Stable Diffusion + ControlNet

Adobe Firefly Video Model Coming Soon | Adobe Video

Sora - Text-To-Video Model How To (Beginners Guide)

OpenAIs SORA Text to Video Model Seamlessly Blends 2 Different Videos!

Text to Video Generation using Videocrafter: Intuitive Math behind Latent Diffusion Model

Open-Sora: Opensource Sora Alternative - Text-To-Video AI Model!

OpenAI's SORA Text to Video Model is Absurd

SORA ; text-to-video AI model by OpenAI

I coded my own Text to Image Diffusion AI Model from scratch - Here is what I learned.

NEW Reka Core SOTA Model Does Text, Audio, Video, and more!

AI animation software - Text to Animation - Text 3d model etc.

Meet Sora: OpenAI's New Text-to-Video Model Explained | #sora

Generative AI text and multimodal embedding models for real world use cases