LLaVA - the first instruction following multi-modal model (paper explained)

Показать описание

There is a lot of emerging interest in developing multimodal foundation models similar to foundation models for language which are LLMs. LLAVA which stands for Large Language and Vision Assistant is the first paper to apply instruction tuning to visual data thereby pushing the possibilities of Large Multimodal Models (LMMs). This video explains the first paper in the LLaVA series of papers such as LLaVA, LLaVA-RLFH, LLaVA-Med and the latest LLaVA 1.5

RELATED LINKS

🛠 🛠 🛠 MY SOFTWARE TOOLS 🛠 🛠 🛠

📚 📚 📚 BOOKS I HAVE READ, REFER AND RECOMMEND 📚 📚 📚

MY KEY LINKS

WHO AM I?
I am a Machine Learning Researcher / Practioner who has seen the grind of academia and start-ups equally. I started my career as a software engineer 15 years back. Because of my love for Mathematics (coupled with a glimmer of luck), I graduated with a Master's in Computer Vision and Robotics in 2016 when the now happening AI revolution just started. Life has changed for the better ever since.

#machinelearning #deeplearning #aibites

Рекомендации по теме

Комментарии

This was a super helpful video - high level, but still detailed enough for me to understand and feel confident that I can try to reproduce their work. Thanks!

Hello-txug

this is great, I was waiting for research like this to come out!

IntrospectiveMinds

Can you please explain the SpeechX paper from Microsoft

MohamedEmad-td

LLaVA - the first instruction following multi-modal model (paper explained)

LLaVA - the first instruction following multi-modal model (paper explained)

New LLaVA AI explained: GPT-4 VISION's Little Brother

Visual Instruction Tuning using LLaVA

LLava: Visual Instruction Tuning

LlamaIndex Webinar: LLaVa Deep Dive

👑 LLaVA - The NEW Open Access MultiModal KING!!!

How LLaVA works 🌋 A Multimodal Open Source LLM for image recognition and chat.

LLAVA: The AI That Microsoft Didn't Want You to Know About!

Paper Reading] Visual Instruction Tuning - LLaVA

LLaVA - This Open Source Model Can SEE Just like GPT-4-V

How To Install LLaVA 👀 Open-Source and FREE 'ChatGPT Vision'

Fine-tune Multi-modal LLaVA Vision and Language Models

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

LLaVA: A large multi-modal language model

LLaVA: Bridging the Gap Between Visual and Language AI with GPT-4

From Zero to First Test in Your Own LAVA Laboratory in less than 45 minutes) - Paweł Wieczorek

Fine Tuning LLaVA

LLM-1: Project Bootcamp - LLaVA

Image Annotation with LLava & Ollama

Use Llava In GroqCloud & OpenWebUI

Experiment: LAVA vs BULLETPROOF GLASS

The Floor is Lava with Nastya and dad

Installing LLaVA (LLM/GPT with vision) on Windows

Microsoft LLaVA-Med on Google Colab