What is Spatial AI? 'The Next Frontier of AI Architecture'

preview_player
Показать описание
Fei Fei's interview with a16z, plus my reaction.

Join My Newsletter for Regular AI Updates 👇🏼

My Links 🔗

Media/Sponsorship Inquiries ✅
Рекомендации по теме
Комментарии
Автор

Matt playing that video at 1.5x while I am playing this video at 1.5x

Cine
Автор

00:00 - Introduction to Spatial AI and Fei-Fei Li
00:35 - Fei-Fei Li's Contributions: ImageNet and AI Research
01:43 - Cambrian Explosion in AI: Pixels, Videos, and Language
02:09 - AI's Evolution Through Deep Learning and Data
03:12 - Early Days of Deep Learning: ImageNet and Commercial Applications
03:54 - Rise of Generative and Discriminative Models in AI
04:20 - Algorithmic Advances in Computer Vision and Language Modeling
05:16 - AI’s Power Unlocked by Large Data and Compute
06:14 - Key AI Papers: Attention and Transformers
07:28 - Breakthroughs in AI with AlexNet and GPU Power
09:57 - Supervised Learning Era and AI Data Utilization
11:22 - AI Research and Algorithmic Breakthroughs in Academia
13:54 - Evolution of Generative AI: Style Transfer and Image Generation
15:10 - Speed and Optimization in Generating AI Images
16:36 - Gradual Advancement of AI Towards AGI
18:05 - Fei-Fei Li’s North Star: Visual Intelligence and Storytelling
19:46 - Current AI Capabilities: Computing Power and Algorithm Depth
21:13 - Tesla’s Use of Real-World Data for AI Training
23:08 - Transitioning from 2D to 3D: Learning Structures in AI
24:27 - Nerf’s Breakthrough in 3D Computer Vision
25:33 - AI's Focus on Reconstruction and Generation Convergence
27:04 - AI Representation of the 3D World Through Physics and Structure
28:12 - Spatial Reasoning and Limitations of Multimodal AI Models
29:14 - Contrast Between 2D and 3D World Understanding in AI
30:12 - Processing and Representing 3D World Data in AI
31:35 - Human Perception of 3D World Through 2D Visuals
32:57 - Future Applications of AI in Virtual 3D World Creation
34:19 - 3D World Creation and its Economic Impact in Gaming
35:46 - Desire to Explore and Simulate 3D Virtual Worlds
36:58 - Impact of Spatial Intelligence on AR and VR Technologies
38:00 - Spatial Intelligence as the Operating System of 3D Future
39:02 - Ideal Hardware for Spatial AI: Glasses vs. Goggles
39:33 - Blending Digital and Physical Worlds: AR and Robots
40:06 - Conclusion: Spatial Intelligence’s Role in Robotics and AI

mendthedivide
Автор

Please don't speed up the video. We can speed it up if we want to.

johnwilson
Автор

I watched this interview on the a16z channel. it is so amazing how much AI is making actual progress even among all the over-bloated market economics taking place right now into making anything and everything AI which is making many people wary. Well, if the internet and AI took away your critical thinking skills, AI and internet isn't responsible for it. I'm so sick of channels interviewing "CS researchers and scientists" and they spend like an hour decrying everything AI. Folks, use you own goddamned brains a little!

JustaSprigofMint
Автор

you cant watch technical interview at highh speed

hqcart
Автор

Didn't feel like their call brought anything new to the table at all.

thend
Автор

So many incredible use cases for this technology. People creating their own ‘happy places” they can just put on the headset and submerge into while on a long flight.

jbavar
Автор

This 40 min long video was worth every sec of it. Thanks Matt.

zmeireles
Автор

I've done a few tiny experiments just prompting an LLM having it think of real world descriptions in terms of spatial coordinates of objects.
It seemed to be a little better at solving real world 'puzzles', like what happens to a ball in a cup that gets turned upside down.
Not perfect - but it did uncover some of the model's vague understanding of what a 'cup' is, for example.

tomcraver
Автор

How about a video series teaching us how to put our Nvidia gaming GPUs to use for AI. I know you already have tutorials on running llms locally, but I keep finding out about new features and use cases with my RTX4080, and I can't find a YouTube channel that really focuses on getting the most out of these GPUs for AI. Chat with RTX is pretty cool for beginner's. I've moved on to OpenWebUI and ComfyUI for my llms and diffusion models. Currently trying to figure out ComfyUI properly, isntead of just using other peoples workflows and dropping my own loras in. I want to learn how to use 2-3 loras. 1 lora for creating images of myself, one for a certain style of image, and one to improve photo realism, skin tone, etc.

henrismith
Автор

fei fei is great! she have the pioneer's dna

alexlanayt
Автор

Fascinating. I would point out though that ADN condons are "words" in nature (not floating in the sky, but embedded in every living creature) and that they convey 3D intelligence.

RWilders
Автор

Spatial AI is a thing for sure. Our world is dimensional. If an AI wants to understand everything that is happening in our world, it has to learn about all the dimensions and the physical connections. You said that, for example, Tesla has a lot of spatial data. That is correct for sure, but for quite some time we collect spatial information with our mobile devices. Think back to 2014/2015 when google showed project "tango" (the mobile AR that became ARCore in the end). What they wanted is to capture all the insides of the buildings they capture from outside with Google street maps. And we should not forget that also audio is spatial. You can capture a lot of data just by analyzing audio data (submarines do this for a long time). I am very excited about the spatial AI approach and i hope it will be available for everyone (open source). Thanks for the video, Matthew.

chriswatts
Автор

It seems that the practical applications of these developments are still in the process of being fully realized. I hope they are finding ways to apply these innovations beyond just gaming. With that in mind, here are a few possible areas where they might have significant impact:

1. Generating a comprehensive set of architectural and engineering designs based on site parameters and design preferences.
2. Creating 3D product designs, such as furniture or wearable technology, that adapt to environmental factors and surroundings.
3. Offering emergency assistance through augmented reality, such as using smart goggles to guide someone through landing a plane in a critical situation.
4. Enabling underwater robotic welding to facilitate complex repairs in challenging environments.
5. Utilizing autonomous drones that can navigate hostile environments and selectively target designated individuals. It might sound harsh, but it’s likely similar technology to what they would be using for shooting games.

honkiemonkey
Автор

Interesting to know Lua was used for AI at the time, as Justin mentioned it

martialg
Автор

I feel like our ancient ancestors - pre-language - had a non-communicated version of language that language sits on top of but doesn't necessarily describe, even with huge language sets. We can hard code physics and things of that nature to almost literally ground AI in the real world, with being able to probe its answers for a quite material-based explanation for why it is stating something. One of the most captivating use cases for VR is just exploring and screwing around in photoreal 3d environments, "telling stories / pretending" BUT there is a huge lack of these at low price that are quality - I have to dig through about 80% detritus for the good stuff. If this suddenly wasn't the case, it would mean endless novel worlds to explore, personalized, responsive to speech, hand motions, etc. This drives mass adoption of VR, lowering the cost of high res, high FOV headsets that are comfortable, which in turn drives forward spatial AI

joshkar
Автор

very interesting topic, as I work in virtual construction, obviously it goes way beyond this but just reaffirms my opinion that ppl won't be designing and coordinating much for construction for much longer.

marcuss
Автор

I would agree that training data based on real-world interaction will be required to make an AI that can operate without bias.

patrickmchargue
Автор

Regarding about the image generation being more like a continuum from previous milestones but was regarded by the public like an abrupt new thing, it made me think of the quote:
"It takes 10 years to become an overnight success."

solidreactor
Автор

This is the future in my opinion. I see this modaility taking over the 1D, 2D image modality in the near future. It'll give much more details in color and nuances to the LLM to process. It might be a little more computionally expensive but the benefits will outeeigh the risks and "costs" I think. The three modalities LLM has now will reach their thresholds at some point. It'll need to lean on more advanced technologies like this one too process more nuanced representations from the 3D or the real world.

Another one is video reading. Current LLMs can't do that. It reads transcripts only not the visuals.

With the camera shutter mode it's possible for this tech to read videos in 3D. Those will enrich the LLM in its representation algorithm. With a richer context, it'll reduce case of hallucinations even more.

Anyways, this is exciting stuffs if you'd ask me.

h.c