Florence 2 - The Best Small VLM Out There?

Показать описание

There is a new VLM on the scene and it comes with a dataset of 5Billion labels. The new model can do a variety of old world tasks like bounding boxes and segmentation along with newer LLM style captioning etc.

For more tutorials on using LLMs and building Agents, check out my Patreon:

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro
00:13 Florence-2 Paper
02:19 Florence - 2 Architecture
03:20 Florence - 2 Detailed Image Captioning
03:41 Florence - 2 Visual Grounding
04:09 Florence - 2 Dense Region Caption
04:24 Florence - 2 Open Vocab Detection
06:01 Hugging Face Spaces Demo
10:41 Colab Florence - 2 Large Sample Usage

Рекомендации по теме

Комментарии

Thanks for your work on sharing this information. Much easier to watch your content than keep my ear to the ground all day trying to keep up. Much appreciated, sir.

parkerspitzer

Thanks for the great content. A video going through the fine-tuning process on this one would be amazing. I am not sure how this could scale to a video implementation (probably passing a frame each time).

danielmz

It's also good at OCR for hand written documents

IsxaaqAcademy

Thanks Sam!!
Please keep up the great work...

IanScrivener

I'd love seeing a fine tuning video, specially if it's not question answering, just so it's a different use case from the documentation. Maybe with a quick intro talking about what are possible scenarios where fine tune would be specially helpful.

GiovaniFerreiraS

Thanks, Sam! I always appreciate your videos.

I would love your take on how Florence-2 compare with Apple's 4M-21.

jefframpe

This is what people should call "small", anything below 1B! Thanks for your video. By the way, I played around with the quantized version, the result is unbelievably good! I shared a post on Twitter and mentioned you and shared the Colab. Take a look at it. I tried 8 bits and 4 bits. It's odd how 4 bits is almost the same as the base model!

unclecode

Thanks for the information this is great.
Can i fine tune it for certain specific images like few short learning. Can you put a tutorial for the same it will be great full.

sohitshivhare

I've tried this model, describing the image is great. I've also tried the docvqa, but giving only one word answers and not getting even simplest questions right. i had hoped to do some classification and compare with other models.

ranu

@samwitteveenai please make a fine-tuning video about VLMs such as: Llava, Florence-2 and if possible try to use Ollama so that we can make the inference on local device.

RishabhMathur

Thanks a lot for this I wish you could consider the continuing process for identifying authentic and fake certificates 🙏🙏🙏

richardobiri

When will you release a demo on to fine-tune such model ?

yassinebouchoucha

I think fine-tuning for OCR would be a good demo. OCR in the real world with images of documents is much harder than OCR on electronic documents so would be cool to see how a small model like this does as an alternative to Claude/GPT4.

ariramkilowan

what would you pick for fine-tuning ?
Any specific application ideas?

ShravanKumar

Please do fine-tuning for Object detection

JustEmbraceTheChallenge

We request you to do fune tuning on object detection. Because, all llms are useful generating text oupit only. Thanks in advance

srk

Where is the dataset? I couldn't find the release

SinanAkkoyun

Is this really idea to use data created by another model to train your model ? 1:54
Isn't it going to replicate the errors from other models ?

xl

Hi Sam, thanks for the video. What do you think about how does it compare with Phi3-V? My take is that this is more raw and better for fine tuning, do you also think so?

AbhishekKotecha

Florence 2 - The Best Small VLM Out There?

Florence 2 - The Best Small VLM Out There?

ComfyUI With Florence 2 Vision LLM - This Is Not Just A Segmentation Model

How to Fine-tune Florence 2: The Best Small Vision Model

Eating the BEST FOOD in FLORENCE, ITALY #shorts

2 Best Enemies vs. Florence and The Machine – Spectrum Phases (Mashup Jack)

Microsoft Florence 2 - Is it the best open source foundational vision model?

John John Florence vs. Gabriel Medina - Semifinals, Heat 2 - Quiksilver Pro France 2017

Best FLORENCE itinerary | Spend 2 Days (Save this plan!)

2 Best Enemies vs Florence and The Machine – Spectrum Phases (Mashup Jack) (Free Download)

6 Florence Hotels with a Stunning View You Can't Miss! 🌟 #visitflorence

9 best things to do in Florence, Italy 🇮🇹 #florence #italytravel #florenceitaly #travelguide

5 things to do in Florence, Italy #florence #italy

Florence + the Machine - Jenny of Oldstones (Lyric Video) | Season 8 | Game of Thrones (HBO)

Every Florence Tourist Should Know These Tips In 2025 🇮🇹🤓

TOP 10 Things to do in FLORENCE | Italy Travel Guide 4K

Top Things to Do in Florence, Italy | ULTIMATE Things To Do and See Travel Guide

What to do in Florence, Italy! Top attractions, food, day trips to Tuscany #italytravel #italylovers

Is there anything Florence Pugh CAN'T do?!

The most iconic sandwich in Italy 🇮🇹 #florence #sandwich #italy #tuscany #focaccia #cooking

How to Plan a Trip to Florence, Italy | Florence Travel Guide

How to see FLORENCE in a Day Guide

Black Souls 2 Soundtrack - Heavenly Beast Florence

Princess vibes from Florence Pugh in Dune Part 2

What to do in Florence, Italy! Top attractions, food, day trips to Tuscany