Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing

preview_player
Показать описание
In this video, we'll fine-tune LLAVA, an open-source multi-modal LLM from HuggingFace, to extract information from receipt images and output it as JSON. By the end, we'll deploy the model using Flask and create a Streamlit dashboard for this task.

00:00 Intro
00:42 Dashboard demo
01:55 LLAVA background
02:44 LLAVA playground
04:23 Fine-tuning pipeline schema
06:21 Hardware requirements (Hyperstack GPUs)
07:59 Sample datasets (cord-v2 and docvqa)
12:09 LLAVA architecture
15:07 Project code overview
15:57 Test LLAVA 7B to 34B
23:38 This video's pipeline overview
25:12 Data preparation
37:29 Model preparation and training
45:33 Testing the fine-tuned model
48:18 Model deployment and dashboard design

#hyperstack #gpu #huggingface, #pytorch #streamlit
#llm #python #llava

📚 Extra Resources:
Рекомендации по теме
Комментарии
Автор

Just came after seeing post on LinkedIn as I follow you there - going to try on weekends

intresting
Автор

Thanks for this informative video. I have a question: how can we perform distributed model training on multiple GPUs? In this video, the training is performed on a single 80GB GPU. For example, if we want to perform the training on multiple GPUs (48, 48GB) than what should we do?

MuhammadAdnan-tqfx
Автор

What do you suggest for that making Python GUI app using tkinkter? or do you prefer other one? do you have any video for it? Thank you in advance!!! Big fan of your teaching!!!

PareshPawar-yw