“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

preview_player
Показать описание
Explore Multimodal language model, like LLaVA, which enables you reach GPT4 level multimodal abilities, unlock use cases like chat with images

🔗 Links

⏱️ Timestamps
0:00 Intro
1:03 What is multimodal?
1:23 LLaVA model
2:08 Demo
3:35 Use case: Product development
5:17 Use case: Content curation
6:27 Use case: Medical
7:07 Use case: Captcha
8:09 Use case: Robots

👋🏻 About Me

#gpt #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #largelanguagemodels #largelanguagemodel #chatgpt #multimodality #gpt4 #multimodal #llama2 #llama #llava #machinelearning
Рекомендации по теме
Комментарии
Автор

Thank you AI Jason for sharing valuable AI developments. Would love to see in the future how to train the model on our own photos. Nice..

nessandroduyan
Автор

woahh, this is prob the best multi modal model ive tried, definitely open up lots of imagination!

Jim-eyry
Автор

Jason your videos are next level!! Loved the agent that you made for research. I made one similar using your video and ive been using it to research my work and its pretty awesome! Saved me tons of time already!!!

exaliber
Автор

Great videos dude! Love the content and how compressed the info is!

danmustlearn
Автор

Absolute banger dude your content is actually top tier

SaminYasar_
Автор

i need more content from this channel!

juancasas
Автор

Thank you big man for such amazing videos, Thank you !

MWKING
Автор

Ok this is crazy! So now you can added more context. It's like us using our 5 senses to interpret information. But this part here @3:42...if this becomes possible where it builds full stack apps easily. Say goodbye to Junior developers. At that point anyone can sketch an app with the entire workflow, show the image to the A.I. along with the description like "Build this app you see with react in the front end, node js/express for the backend, create the api's and connect them to the front end" GAME OVER!!!

Camxlare
Автор

Like the example use cases. Indeed it seems LLAVA is not the good for rich text OCR. Definitely an area of improvement. Still promising anyway. I would love a second episode on fuyu 8b or a tutorial on how to further fine tune LLAVA for specific use case. Thanks a lot for sharing !

redfield
Автор

LLaVA was out there a long time already. Great that they are not dead and added support for LLama V2

eck
Автор

It's my understanding that Palm 2 is hooked to Bard. Gemini is the future. Google has to figure out how to mesh Gemini into Palm 2 and Palm 2 into Gemini. Gemini has all the new multimodal features that Palm 2 I assume will pick up if they can learn how to sync it.

hope
Автор

excellent content bro, keep up the good work

amandamate
Автор

best AI content on youtube. Learned so much from you. Is it plausible to run this on consumer grade gaming machine with for instance rtx4090 ? Will you do an install / setup video?

preben
Автор

Great video, the 13b multi model are doing amazing good. Love to see a video for the following use case: say I am a HR manager and have 2 job positions JOB-A and JOB-B. Can an LLM do the filtering of job resumes based on the requirements of JOB-A and JOB-B with few shot training or fine tuning. Its a prediction task alike sentiment analysis...

henkhbit
Автор

you used non squre images with a crop option. so what it saw was cropped

vitalysumin
Автор

Nice introduction, thank you for your effort

JavArButt
Автор

This is insane... Because it's just a first experimental version of only a mere 13b parameter size model... And it can identify a pretty convoluted colourless picture and make the story out of it... Not to mention correctly rate a picture on an arbitrary score and identify what app you're gunning for without telling it the kind of app... The future looks pretty scary...

BlackTakGolD
Автор

Do you think the choice of vector database matters for storing this multimodal data? For example, does Weaviate vs.Cchroma offer certain features that might make it optimal for these multimodal vectors?

adamhughes
Автор

I wonder how these multi modal models will affect robotics and self driving

SloanMosley
Автор

I wonder why it failed the captcha? There’s already AI out there that can crack the captcha easily.

Jump--the-moon