The Visual Task Adaptation Benchmark

Показать описание

This paper presents a new benchmark for Visual Task Adaptation (i.e. BERT for images) and investigates several baseline methods for doing so.

Abstract:
Representation learning promises to unlock deep learning for the long tail of vision tasks without expansive labelled datasets. Yet, the absence of a unified yardstick to evaluate general visual representations hinders progress. Many sub-fields promise representations, but each has different evaluation protocols that are either too constrained (linear classification), limited in scope (ImageNet, CIFAR, Pascal-VOC), or only loosely related to representation quality (generation). We present the Visual Task Adaptation Benchmark (VTAB): a diverse, realistic, and challenging benchmark to evaluate representations. VTAB embodies one principle: good representations adapt to unseen tasks with few examples. We run a large VTAB study of popular algorithms, answering questions like: How effective are ImageNet representation on non-standard datasets? Are generative models competitive? Is self-supervision useful if one already has labels?

Authors: Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

Рекомендации по теме

Комментарии

I have no idea why this channel doesn't have more views?
I personally get soo frustrated with research papers using overly complicated word salad instead of saying what it actually is.

Exaggerated Example:
advance recall intuitive short-term memory artificial intelligence spacious false impression identifier,
with automatic rectify attributes to reform artificial intelligence live memory.

translation for exaggerated example:
When nearal-network makes a large error, error gets stored in python list, then run through the network again to correct the weights.

happydays

What software do you use to record these videos of you highlighting text and navigating the document like this? Maybe it is just screen and audio recording, but on Apple software - as the kind of highlighting and scribbling is not easy on Adobe reader

taylorsmurphy

Is there a difference between "task adaption" and "transfer learning", or are those basically synonyms?

kevalan

A person who likes the method could just pretend it works and willingly not use information which he knows could easily be found. He could even ignore information that is directly available. If he decides, before starting, that he will not circumvent the process, and has enough willpower and does not want to break his decisions about himself, he could realistically hold a paper in his hand, and not read it. (Do not even read the abstract to be proud of yourself.)

vsiegel

The Visual Task Adaptation Benchmark

The Visual Task Adaptation Benchmark

Self-Supervised Learning of Video-Induced Visual Invariances

Multitask Vision-Language Prompt Tuning

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Temporal Action Localization, Hallucination Benchmark, and Attention for ViTs | Multimodal Weekly 62

Benchmark & Challenge Summary (Chunyuan Li): ECCV 2022 Computer Vision in the Wild

Big Transfer (BiT): General Visual Representation Learning (Paper Explained)

【EP3】Large-Scale Visual Representation Learning with Vision Transformers

A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models

AdaMatch Explained!

Lucas Beyer (Google DeepMind) - Convergence of Vision & Language

Session 3C: A Statistical Framework for Benchmarking Foundation Models with Uncertainty

Neural Search for Low Resource Scenarios

Scaling Vision and Language Learning with Vision Transformers (Xiaohua Zhai) | Tutorial (2/3)

Lucas Beyer | Learning General Visual Representations

visual adaptation

'okay, but I want Llama 3 for my specific use case' - Here's how

[VLP Tutorial @ CVPR 2022] VLP for Vision Part III

Task Lighting with Linear LEDs: Design Principles Tutorial

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains - ArXiv:2407.189

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains - ArXiv:2407.189

LIGHT FIELDS III – Master of challenging visual tasks

ICCV19: Oral Session 1.2A - Architectures, Multi-Task Learning, Domain Adaptation

Contrastive Test-Time Adaptation @CVPR22 | Dian Chen @ToyotaResearchInstitute