Simple Token-Level Confidence Improves Caption Correctness

Показать описание

Authors: Suzanne Petryk; Spencer Whitehead; Joseph E. Gonzalez; Trevor Darrell; Anna Rohrbach; Marcus Rohrbach
Description: The ability to judge whether a caption correctly describes an image is a critical part of vision-language understanding. However, state-of-the-art models often misinterpret the correctness of fine-grained details, leading to errors in outputs such as hallucinating objects in generated captions or poor compositional reasoning. In this work, we explore Token-Level Confidence, or TLC, as a simple yet surprisingly effective method to assess caption correctness. Specifically, we fine-tune a vision-language model on image captioning, input an image and proposed caption to the model, and aggregate either algebraic or learned token confidences over words or sequences to estimate image-caption consistency. Compared to sequence-level scores from pretrained models, TLC with algebraic confidence more than doubles image and group scores for compositional reasoning on Winoground. When training data are available, a learned confidence estimator provides further improved performance, reducing object hallucination rates in MS COCO Captions by a relative 30% over the original model and setting a new state-of-the-art.

ComputerVisionFoundation Videos

Рекомендации по теме

Simple Token-Level Confidence Improves Caption Correctness

Simple Token-Level Confidence Improves Caption Correctness

FREE ABILITY IN BLADE BALL!! #shorts #roblox #bladeball

Is Jeff Bezos Really That Approachable #wealth #jeffbezos #celebrity #entrepreneur #ceo

Comment yes for more body language videos! #selfhelp #personaldevelopment #selfimprovement

TOP 10 BEST ABILITY TO COUNTER INFINITY I Blade Ball I #roblox #bladeballedit #fyp

#service #F&Bservice ❤️ #hospitality #PrabeshKhanal #hotelmanagement #restaurant

Sales ki mol baat Basics of Sales @Sandeep Maheshwari

DOMINOS IS A SCAM!!🍕😳#shorts

NEW Blade Ball CODE!! 🤑

Cloning a Cute Girl in a DNA Lab🧬👨‍🔬

Power of Magic Powder 🔥😲 #shorts #youtubeshorts #shortsfeed #carrom #viral

English Conversation Practice: Common Expressions to Improve English Speaking

Life motivation ✨/tamil/ parveen sultana mam/about 1st mark💯#shorts#part -1 ✨part -2 in description...

Viral Your Gaming Video #shorts #freefire

I Was Reborn as a Billionaire's Son with a System That Doubles Every Penny I Spend

How to Change Your Money Mindset 2023

Which AI Chatbot Has Hidden Watermarks Students Should Know About?

Simple Inference and Generation Using Multimodal Information - Dr. Shay Cohen

Which Is Better For You: Ledger Mesh or Metamask?

Seedance 1.0: New #1 Video Generator - Architecture, Data, Training, Science, Optimizations - Paper

Describe Anything: Detailed Localized Image and Video Captioning

This is the AI Gold Rush (now go sell some shovels)

If you're 65-80 and can still do this, you're a rare gem and destined to live to 100. seni...

𝗚𝗮𝗺𝗲 𝗩𝘀 𝗥𝗲𝗮𝗹 𝗖𝗿𝗶𝗰𝗸𝗲𝘁 𝗶𝗻 𝗥𝗲𝗮𝗹 𝗖𝗿𝗶𝗰𝗸𝗲𝘁 𝟮𝟰 #𝗿𝗰𝟮𝟰 #𝗰𝗿𝗶𝗰𝗸𝗲𝘁 #𝗶𝗽𝗹 #𝘀𝗵𝗼𝗿𝘁𝘀...