Florence-2: Fine-tune Microsoft’s Multimodal Model

preview_player
Показать описание
Learn how to fine-tune Microsoft's Florence-2, a powerful open-source Vision Language Model, for custom object detection tasks. This in-depth tutorial guides you through setting up your environment in Google Colab, preparing datasets, and optimizing the model using LoRA.

Chapters:

- 00:00 Introduction: Unlock the Power of Florence-2
- 01:09 Getting Started: Prepare for VLM Fine-Tuning
- 03:55 Florence-2 in Action: Explore Pre-trained Capabilities
- 07:00 Dataset Deep Dive: PyTorch Data Loading for Florence-2
- 13:02 LoRA: Optimize Your VLM Training
- 14:21 Fine-Tuning: Unleash Florence-2's Custom Object Detection
- 17:30 Model Evaluation: Measure Your VLM's Success
- 21:37 Florence-2 vs Other Computer Vision Models
- 24:09 Conclusion and Next Steps

Resources:

Рекомендации по теме
Комментарии
Автор

I've been waiting for this tutorial for days.

Thank you again for being the first to comprehensively review this new model.

Super exited! 🎉🥳

abdshomad
Автор

thank you roboflow for providing such nice and lovely tutorials for free and with a nice instructions

jk_c
Автор

Thank you for this turtorial, was working on these kind of setup for a couple of days. You definetely could save lot of time

artem-ywkm
Автор

Thanks a ton for this awesome video! Every single term is explained so clearly—it's super helpful.

I can't wait to dive in the code and start putting this knowledge to use!

SatyamKumar-cbmt
Автор

Very informative video. Thanks for making auch a valuable video free of cost. Just one request when your you make tutorials if possible try to do inferencing, training or fine tuning on agricultural or satellite related data.

VLM
Автор

how to train this model on custom dataset for OCR

SridharanS-vzre
Автор

Thanks for the Video tutorial.

Though multiple tasks can be achieved by this model, all the videos are single task

Can you explain how we can tune the model for two different tasks, for example : OCR and OD

NaveenKumarLaskari
Автор

hello bro!Thank you for your selfless sharing all along.When I was fine-tuning Florence-2, I encountered some issues, and now I would like to seek your advice.
Resolving Accuracy Issues in Chinese Output for Florence-2 Fine-Tuned with LoRA:Using the llava-instruct-chinese dataset, the image encoder weights are frozen, and the language part of Florence-2 is fine-tuned using the LoRA method. While performing the "CAPTION" task, the model is capable of outputting in Chinese, but the accuracy of the answers is zero. How can this issue be resolved?

yjrljjw
Автор

thank you for the video tutorial, you are 👏👏👏
I hope there is this tutorial using jupyter notebook 😁

arifahnurainia
Автор

Thank you for the awesome tutorial! I wonder what about the detection accuracy comparing to YOLO based model?

kylewang
Автор

Thanks Sir. Please do fine-tuning for Oct, captioning and segmentation task

geniusxbyofejiroagbaduta
Автор

9:35 how did you see this embedding vector projection thing for the Roboflow 100 datasets?

nikilragav
Автор

I would really really really really really like to see how you do train multiple datasets on different tasks like OD, OCR, REGION_PROPOSAL, and maybe something like OPEN_VOCABULARY on 1 set and MORE DETAILED CAPTION on another and seeing if effectively can transfer the knowledge for example including in the captioned images things that are not in the caption dataset but are in the other or improve OCR in images description

barderino
Автор

Hi, I'm looking to fine-tune Florence 2 for Segmentation task. Would appreciate your insights!

TheVarun
Автор

Hey guys do you have have example to finetune an OCR model by Florence-2?

hegalzhang
Автор

Wonderful tutorial! Could you make a tutorial about how to fine tune florence 2 for the segmentation task?

sandrojunioraraujo
Автор

Master, could you please tell me if Florence-2 can perform SER (Semantic Entity Recognition) and RE (Relation Extraction) tasks? If so, what should my dataset look like? 🤔

dabaizhang-xb
Автор

Why does the Florence model results are different when you re-run the code ?

indranilcool
Автор

For the community session I have a couple of (beginner) questions:
- the google collabs on roboflow seem to be linux based, is there an easy way to make them work on windows?
- in general, how do I download a model (YOLO) to use in a python app (on windows)
- are there models that would run for realtime video detection on a regular laptop with an integrated iGPU?
- I am planning to use a YOLO model for a sports live stream, but only have a simple 3 Year old mid range laptop on me - would it be better to send the stream over to my desktop PC with an Rtx3060Ti-8GB and let the model run there (and send back the detection back and sync on the laptop) - if a laptop is underpowered?
- for simple applications, like the realtime sports detection of yours, would it be better to run it on my own hardware or investigate in cloud servers for inference?

Thank you very much for your tutorials, the help a lot!

-P
Автор

Hell Sir Thanks for your all videos and efforts. I am following your channel, but I request you please upload one detail video on how to finetuning Yolov5 model for custome images classification.

mctgpfi