“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

Показать описание

Explore Multimodal language model, like LLaVA, which enables you reach GPT4 level multimodal abilities, unlock use cases like chat with images

🔗 Links

⏱️ Timestamps
0:00 Intro
1:03 What is multimodal?
1:23 LLaVA model
2:08 Demo
3:35 Use case: Product development
5:17 Use case: Content curation
6:27 Use case: Medical
7:07 Use case: Captcha
8:09 Use case: Robots

👋🏻 About Me

#gpt #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #largelanguagemodels #largelanguagemodel #chatgpt #multimodality #gpt4 #multimodal #llama2 #llama #llava #machinelearning

Рекомендации по теме

Комментарии

Thank you AI Jason for sharing valuable AI developments. Would love to see in the future how to train the model on our own photos. Nice..

nessandroduyan

woahh, this is prob the best multi modal model ive tried, definitely open up lots of imagination!

Jim-eyry

Jason your videos are next level!! Loved the agent that you made for research. I made one similar using your video and ive been using it to research my work and its pretty awesome! Saved me tons of time already!!!

exaliber

Great videos dude! Love the content and how compressed the info is!

danmustlearn

Absolute banger dude your content is actually top tier

SaminYasar_

i need more content from this channel!

juancasas

Thank you big man for such amazing videos, Thank you !

MWKING

Ok this is crazy! So now you can added more context. It's like us using our 5 senses to interpret information. But this part here @3:42...if this becomes possible where it builds full stack apps easily. Say goodbye to Junior developers. At that point anyone can sketch an app with the entire workflow, show the image to the A.I. along with the description like "Build this app you see with react in the front end, node js/express for the backend, create the api's and connect them to the front end" GAME OVER!!!

Camxlare

Like the example use cases. Indeed it seems LLAVA is not the good for rich text OCR. Definitely an area of improvement. Still promising anyway. I would love a second episode on fuyu 8b or a tutorial on how to further fine tune LLAVA for specific use case. Thanks a lot for sharing !

redfield

LLaVA was out there a long time already. Great that they are not dead and added support for LLama V2

eck

It's my understanding that Palm 2 is hooked to Bard. Gemini is the future. Google has to figure out how to mesh Gemini into Palm 2 and Palm 2 into Gemini. Gemini has all the new multimodal features that Palm 2 I assume will pick up if they can learn how to sync it.

hope

excellent content bro, keep up the good work

amandamate

best AI content on youtube. Learned so much from you. Is it plausible to run this on consumer grade gaming machine with for instance rtx4090 ? Will you do an install / setup video?

preben

Great video, the 13b multi model are doing amazing good. Love to see a video for the following use case: say I am a HR manager and have 2 job positions JOB-A and JOB-B. Can an LLM do the filtering of job resumes based on the requirements of JOB-A and JOB-B with few shot training or fine tuning. Its a prediction task alike sentiment analysis...

henkhbit

you used non squre images with a crop option. so what it saw was cropped

vitalysumin

Nice introduction, thank you for your effort

JavArButt

This is insane... Because it's just a first experimental version of only a mere 13b parameter size model... And it can identify a pretty convoluted colourless picture and make the story out of it... Not to mention correctly rate a picture on an arbitrary score and identify what app you're gunning for without telling it the kind of app... The future looks pretty scary...

BlackTakGolD

Do you think the choice of vector database matters for storing this multimodal data? For example, does Weaviate vs.Cchroma offer certain features that might make it optimal for these multimodal vectors?

adamhughes

I wonder how these multi modal models will affect robotics and self driving

SloanMosley

I wonder why it failed the captcha? There’s already AI out there that can crack the captcha easily.

Jump--the-moon

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

🌋 LLaVA: Vision LLM based on LLama2

Llama | ChatGPT as OCR Vision document AI

Computer vision with LLM!

Supercharging LLama-2: Enhancing Performance on Any Task with ChatGPT Dataset | LLM Finetuning

Build Anything with Llama 3 Agents, Here’s How

[ML News] LLaMA2 Released | LLMs for Robots | Multimodality on the Rise

How To Install LLaVA 👀 Open-Source and FREE 'ChatGPT Vision'

LLaVA - This Open Source Model Can SEE Just like GPT-4-V

StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?

ThursdAI July 20 - LLaMa 2, Vision and multimodality for all, and is GPT-4 getting dumber?

Read a paper: Enhancing LLMs with vision

Best Model of LLama 2 | Live Performance Comparison

Video LLaMA: An Instruction tuned Audio Visual Language Model for Video Understanding

I Tested Meta's NEW AI: Llama 2

Meta Llama 2: The Beginner's Guide! (Trained on 2 TRILLION Words 😱)

How can LLMs improve Vision AI? OCR, Image & Video Analysis

Unleash the power of Local LLM's with Ollama x AnythingLLM

This 'Video LLama' AI Is DISRUPTING The Industry!

LLaVA LLM: Visual and Language Multimodal Model Chatbot

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

Llama Adapter

New LLaVA AI explained: GPT-4 VISION's Little Brother

Llama 101