LLaVA: A large multi-modal language model

Показать описание

In this video, we'll learn about LLaVA (Large Language And Vision Assistant), a multimodal model that integrates a CLIP vision encoder and the VICUNA LLM.

We'll see how well it gets on describing a cartoon cat, a photo of me with AI generated parrots, and a bunch of images created by the Mid Journey Generative AI tool.

And most importantly, we'll find out whether it knows who Cristiano Ronaldo is!

#AI #MultimodalModels #llava #GPT4 #ImageRecognition #Streamlit #MachineLearning #AndrewNg #llms

Learn Data with Mark

Рекомендации по теме

Комментарии

So cool. GenAI is a never ending stream of fun.

aragaodan

Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?

thisurawz

Thanks for posting. I have it working but I do see an error in Cygwin when I run it regarding a missing cl.exe which exists but it seems to be working

kenbajema

Me, immediately honing in on the misspelling of "instruction" at the 17 second mark. 🫠

PeterCorless

LLaVA: A large multi-modal language model

LLaVA: A large multi-modal language model

LLaVA - the first instruction following multi-modal model (paper explained)

Fine-tune Multi-modal LLaVA Vision and Language Models

LLaVA - This Open Source Model Can SEE Just like GPT-4-V

How To Fine-tune LLaVA Model (From Your Laptop!)

LLaVA 1.6 is here...but is it any good? (via Ollama)

New LLaVA AI explained: GPT-4 VISION's Little Brother

How LLaVA works 🌋 A Multimodal Open Source LLM for image recognition and chat.

Multimodal Generative AI for Precision Health

LLaVa 34B model (multimodal large language model) RTX 3090FE 24GB

LlamaIndex Webinar: LLaVa Deep Dive

LLaVA LLM: Visual and Language Multimodal Model Chatbot

LLaVA - Large Open Source Multimodal Model | Chat with Images like GPT-4V for Free

𝐋𝐋𝐚𝐕𝐀:T𝐡𝐞 𝐛𝐞𝐬𝐭 𝐨𝐩𝐞𝐧𝐥𝐲 𝐚𝐯𝐚𝐢𝐥𝐚𝐛𝐥𝐞 𝐋𝐚𝐫𝐠𝐞 𝐌𝐮𝐥𝐭𝐢𝐦𝐨𝐝𝐚𝐥 𝐌𝐨𝐝𝐞𝐥 (𝐋𝐌𝐌) #ai #deeplearning #nlp #languagemodels...

AI that can see 👁️?! LLaVa - a MultiModal LLM that uses images and text 🖼️ #llm #llava #ai #chatgpt...

Large Language and Vision Assistant (LLaVA) Explained

Comparing GPT4-V, Imagen and Llava-1.5 multi-modal LLMs

Introducing LLaVA-NeXT-Interleave: The Ultimate Multimodal AI for Multi-Image and 3D Tasks

Visual Instruction Tuning using LLaVA

Building a Custom LLM for your domain based on LLaVA-Med

LLaVA-RLHF Aligning Large Multimodal Models with RLHF

Are LLaVA variants better than original?

Multi-Modal LLMs: Kosmos 2 & Llava (AI paper summary) - Sway Ducky

LLaVA - Large Language and Vision Assistant