How LLaVA works 🌋 A Multimodal Open Source LLM for image recognition and chat.

preview_player
Показать описание

Join here:

This week we cover the LLaVA paper which is a multimodal model that combines image recognition with an LLM through a chat like interface, removing the barrier to entry for many computer vision tasks.

Рекомендации по теме
Комментарии
Автор

how can we fine-tune Llava on a custom image caption dataset?

thank you for uploading this video:)

Pingu_astrocat
Автор

That man is stressed. Give him a vacation

Akshatgiri
Автор

I have pdf files of handwritten data that I'd like to OCR, perform calculations and finally edit or append the pdf with the results.

I like the idea of using a Custom GPT, but only GPT4 Plus subscribers could use it. So I'd prefer a standalone browser or desktop solution, that anyone drag and drop a file into. However, not sure if ChatGPT4's API assistant has all the Vision / Ai PDF Plugin support.

If using LLaVA+Ollama, would anyone who wants to use my application also need to install the 20GB Ollama?

bennguyen