How To Build Generative AI Models Like OpenAI's Sora

preview_player
Показать описание
If you read articles about companies like OpenAI and Anthropic training foundation models, it would be natural to assume that if you don’t have a billion dollars or the resources of a large company, you can’t train your own foundational models. But the opposite is true.

In this episode of the Lightcone Podcast, we discuss the strategies to build a foundational model from scratch in less than 3 months with examples of YC companies doing just that. We also get an exclusive look at Open AI's Sora!

00:00 - Coming Up
01:13 - Sora Videos
05:05 - How Sora works under the hood?
08:19 - How expensive is it to generate videos vs. texts?
10:01 - Infinity AI
11:23 - Sync Labs
13:41 - Sonauto
15:44 - Metalware
17:40 - Guide Labs
19:29 - Phind
24:21 - Diffuse Bio
25:36 - Piramidal
27:15 - K-Scale Labs
28:58 - DraftAid
30:38 - Playground
33:20 - Outro
Рекомендации по теме
Комментарии
Автор

Chapters (Powered by ChapterMe) -
00:00 - Coming Up
00:49 - Intro: Generative AI for Video
01:13 - Sora Videos
05:05 - How Sora works under the hood?
08:19 - How expensive is it to generate videos vs. texts?
08:55 - How do YC companies build foundation models with just $500K?
10:01 - Demos: Infinity AI
11:23 - Sync Labs' hack to train a Lip Sync Model with a single A100 GPU
12:45 - YC deal with Azure
13:41 - How Sonauto Built a Text-to-Song Model
15:44 - Metalware: Hardware Co-Pilot
17:40 - Guide Labs: Explainable Foundation Model
18:20 - Building your own models vs. Using existing open source models
19:29 - Phind's Clever Hack: Synthetic Data
22:03 - Simulating real-world physics: Atmo (Foundational model for weather prediction)
24:21 - AI in Biology: Diffuse Bio
25:36 - Piramidal: Foundational model for the human brain
27:15 - AI in Robotics: K-Scale Labs
28:58 - DraftAid: AI Models for CAD Design
30:38 - Playground going against giants and Suhail Doshi Background
31:42 - Companies pivoting into AI
32:44 - Takeaway Message
33:20 - Outro

chapterme
Автор

All you're really saying here is that people can build any foundational models as long as openai doesn't also do it. That's not very reassuring to hear. We started with words, now pictures and videos, why would anyone not expect music, robotics, hardware etc down the line?

theniii
Автор

The lipsynching on Tim Ferriss looked way off. There was a bit of an uncanny valley with the deepfake switchover as well.

BrianMPrime
Автор

20:15 I personally find the concept of synthetic data to be a fascinating spur for more neuroscientific research.

People dream about what they study and are constantly reviewing problems they are working on in their head. In other words, I feel that humans use simulations in their own mind to build out the models they use to understand their world.

We might be able to think of this as "generating 1000x more data" than can is directly extracted from the real world.

Another example of this that was done to awesome effect is AlphaGo's self-play training.

jks
Автор

They didn’t source robotics papers for Sora’s architecture. They combined Diffusion Transformers (developed by Peebles) with the video diffusion methods released by Stability/Google/Meta/Nvidia.

alicapwn
Автор

The text/prompt for the video was quite detailed n informational. Even as a bad programmer I was able to mentally construct an algorithm for a video on the fly...maybe I have to watch this podcast more than the first 5 mins to understand why Sora etc is a big deal...

avi
Автор

Who can share the papers which are necessary to get to a level of understanding that is actionable? As explained in the video :)

jess-e
Автор

I appreciate the show and encouraging people to go for it, and I get hyping up the early YC-backed products, but the first couple weren't even super impressive by March 2024 standards, let alone being "the best thing" on the market. I'm not bashing any of the products and I hope they do awesome, I'm just saying these are not at all good examples of "the best we have right now", and is discouraging to hear from you guys.

@ 11:42 The lip sync is completely off. This while perfecting lip sync motion was already accomplished last year.
@15:40: Check out Suno AI v3. That's like GPT-4 compared to GPT-2 (what you showed here)

rezakn
Автор

11:40 Hindi demo is perfect. My first language is not Hindi but can definitely tell it is great translation.

atchutram
Автор

I think Sora is better positioned on imagining a new world and totally a different world than to simulate our perception of what the world is and what the world was.

fildworldcomo
Автор

Yes but how do you find the datasets to train for new foundational models? Like their EEG example - how did she acquire this data to train the models?

awesomeo
Автор

I've been waiting for a new episode for weeks!! Thanks for the content guys!

juanortega
Автор

it's mind blowing, how newly graduates create AI driven products without even 10 years experience and research in ML as well as without spending decent amount of cash. Fascinating !

george_davituri
Автор

Alibaba is also doing some interesting things with AI video, we (open source community) have almost destructured the process.

FunwithBlender
Автор

An idea that anyone can take (though it might already exist):

Use AI to help recreate crime scenes and make recommendations on what data might help better understand and solve cases.

The ideal solution would be able to use data from other cases in order to improve recommendations.

sgdfly
Автор

At 4:17 to 4:20 in one of the column one ladder joint got added.

vikalpjain
Автор

12:40 Really burying the lede here to the question "How are YC companies able to create these models with only $500k?"

We arranged for free compute with MSFT (she didn't say how much, but said hundreds of times more than they'd get otherwise)

JohnSmith-hexg
Автор

interesting that raytracing in games might be done and games will be diffused not rendered

raymond_luxury_yacht
Автор

the lipsync has some better open source free solutions but still cool

FunwithBlender
Автор

Respectfully stable diffusion is way better than anything else to act like mid journey or playground is better is to not understand the flexibility and creativity you have with stable diffusion. Stable diffusion can combine with control net there is a massive community Civitia with LoRA and textual inversion etc and there is a thousand tings you can do from deforum to you name it. Stable diffusion is the only model that can give you precision when needed if you know how to use it, yes its more complex but it is the best model

FunwithBlender