filmov
tv
Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing

Показать описание
In this video, we'll fine-tune LLAVA, an open-source multi-modal LLM from HuggingFace, to extract information from receipt images and output it as JSON. By the end, we'll deploy the model using Flask and create a Streamlit dashboard for this task.
00:00 Intro
00:42 Dashboard demo
01:55 LLAVA background
02:44 LLAVA playground
04:23 Fine-tuning pipeline schema
06:21 Hardware requirements (Hyperstack GPUs)
07:59 Sample datasets (cord-v2 and docvqa)
12:09 LLAVA architecture
15:07 Project code overview
15:57 Test LLAVA 7B to 34B
23:38 This video's pipeline overview
25:12 Data preparation
37:29 Model preparation and training
45:33 Testing the fine-tuned model
48:18 Model deployment and dashboard design
#hyperstack #gpu #huggingface, #pytorch #streamlit
#llm #python #llava
📚 Extra Resources:
00:00 Intro
00:42 Dashboard demo
01:55 LLAVA background
02:44 LLAVA playground
04:23 Fine-tuning pipeline schema
06:21 Hardware requirements (Hyperstack GPUs)
07:59 Sample datasets (cord-v2 and docvqa)
12:09 LLAVA architecture
15:07 Project code overview
15:57 Test LLAVA 7B to 34B
23:38 This video's pipeline overview
25:12 Data preparation
37:29 Model preparation and training
45:33 Testing the fine-tuned model
48:18 Model deployment and dashboard design
#hyperstack #gpu #huggingface, #pytorch #streamlit
#llm #python #llava
📚 Extra Resources:
Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing
Fine-tune Multi-modal LLaVA Vision and Language Models
How To Fine-tune LLaVA Model (From Your Laptop!)
Finetune MultiModal LLaVA
How LLaVA works 🌋 A Multimodal Open Source LLM for image recognition and chat.
LLaVA - the first instruction following multi-modal model (paper explained)
Fine Tune Vision Model LlaVa on Custom Dataset
LLaVA - This Open Source Model Can SEE Just like GPT-4-V
First open-source multimodal math dataset boosts MLLM performance - Podcast
Fine Tuning LLaVA
Visual Instruction Tuning using LLaVA
Fine Tune Multimodal LLM 'Idefics 2' using QLoRA
Fine Tune a Multimodal LLM 'IDEFICS 9B' for Visual Question Answering
Fine tuning Pixtral - Multi-modal Vision and Text Model
How do Multimodal AI models work? Simple explanation
Tiny Text + Vision Models - Fine tuning and API Setup
LLaVA: A large multi-modal language model
👑 LLaVA - The NEW Open Access MultiModal KING!!!
Building a Custom LLM for your domain based on LLaVA-Med
Fine-Tune Large LLMs with QLoRA (Free Colab Tutorial)
How To Install LLaVA 👀 Open-Source and FREE 'ChatGPT Vision'
LLaVA LLM: Visual and Language Multimodal Model Chatbot
Are LLaVA variants better than original?
“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial
Комментарии