LLaVA - This Open Source Model Can SEE Just like GPT-4-V

preview_player
Показать описание
In this video, we look at the newly released LLaVA-1.5-13B which is the latest Open Source Multi-Modal model that can see images.

LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.

LET'S CONNECT:

LINKS:
Рекомендации по теме
Комментарии
Автор

Thank you for the video. Can you please create the followup video with how to run locally and specially if you can show how it would work with Oobabooga :)

tamera
Автор

Superb .. but when I run locally as they mentioned the steps in their original git repo, The model is not downloading properly and facing some other issues too. We would appreciate it if you could make a video on running LLaVA locally without issues. ❣

tamilil-
Автор

I feel so behind watching this after having spent the entire day deriving the standford alpaca model weight diffs, converting them to huggingface format, then finally converting the weights to gguf all with nothing more than a CPU and some CPU RAM. 😅

Don't get me wrong, it felt good watching those tokens print out to the screen after setback after setback (and there were so many setbacks!).

But this is awesome. We have local models that can see, here, talk, and write. We really are in the future now.

teleprint-me
Автор

Niiice, Please make a video on how to run it locally.

moh
Автор

How would the critic system work in such environment? (multi-model)

Hasi
Автор

Yes, please show how to set up and use

Sulayman.
Автор

Will be great if you explain how to make a docker image with it

loicbaconnier
Автор

Will it be good for generating medical reports? Plus can I fine-tune it on medical data?

taheralipatrawala
Автор

Please show how to use Autogen to run PaLM using a Colab example

AI-Wire
Автор

I love your videos, but I'm confused 🤔..you already made a video about LlaVA 5 months ago, called 'LLaVA: Now You can Chat with Your Images'..Anyway, thanks for the refresh and I would love to install this local as well

johannesdeboeck
Автор

I would go more into detail on how it actually works rather then going through all examples.

jirikosek
Автор

Llava is great at understanding pictures, but his text comprehension is abysmal.
I spent 30 minutes trying to explain to him that Tesco is a supermarket. It just couldn't understand it.

beri