ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse)

preview_player
Показать описание
In this episode of Neural Search Talks, we're chatting with Manuel Faysse, a 2nd year PhD student from CentraleSupélec & Illuin Technology, who is the first author of the paper "ColPali: Efficient Document Retrieval with Vision Language Models". ColPali is making waves in the IR community as a simple but effective new take on embedding documents using their image patches and the late-interaction paradigm popularized by ColBERT. Tune in to learn how Manu conceptualized ColPali, his methodology for tackling new research ideas, and why this new approach outperforms all classic multimodal embedding models. A must-watch episode!

Check out ColPali / ColQwen2 & the ViDoRe benchmark:

Timestamps:
0:00 Introduction with Jakub & Manu
4:09 The "Aha!" moment that led to ColPali
7:06 Challenges that had to be solved
9:16 The main idea behind ColPali
13:20 How ColPali simplifies the IR pipeline
15:54 The ViDoRe benchmark
18:23 Why ColPali is superior to CLIP-based retrievers
20:41 The training setup used for ColPali
24:00 Optimizations to make ColPali more efficient
29:00 How ColPali could work with text-only datasets
31:21 Outro: The next steps for this line of research
Рекомендации по теме
Комментарии
Автор

A+, have been curious about ColPali and this was both insightful and easily understood by a non-technical

abcthegreat