ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse)

Показать описание

In this episode of Neural Search Talks, we're chatting with Manuel Faysse, a 2nd year PhD student from CentraleSupélec & Illuin Technology, who is the first author of the paper "ColPali: Efficient Document Retrieval with Vision Language Models". ColPali is making waves in the IR community as a simple but effective new take on embedding documents using their image patches and the late-interaction paradigm popularized by ColBERT. Tune in to learn how Manu conceptualized ColPali, his methodology for tackling new research ideas, and why this new approach outperforms all classic multimodal embedding models. A must-watch episode!

Check out ColPali / ColQwen2 & the ViDoRe benchmark:

Timestamps:
0:00 Introduction with Jakub & Manu
4:09 The "Aha!" moment that led to ColPali
7:06 Challenges that had to be solved
9:16 The main idea behind ColPali
13:20 How ColPali simplifies the IR pipeline
15:54 The ViDoRe benchmark
18:23 Why ColPali is superior to CLIP-based retrievers
20:41 The training setup used for ColPali
24:00 Optimizations to make ColPali more efficient
29:00 How ColPali could work with text-only datasets
31:21 Outro: The next steps for this line of research

Zeta Alpha

Рекомендации по теме

Комментарии

A+, have been curious about ColPali and this was both insightful and easily understood by a non-technical

abcthegreat

ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse)

ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse)

ColPali: Vision Language Models for Efficient Document Retrieval

LlamaIndex Webinar: ColPali - Efficient Document Retrieval with Vision Language Models

Ep 27. ColPali: Efficient Document Retrieval with Vision Language Models

Gerard presents: ColPali: Efficient Document Retrieval with Vision Language Models

ColPali: Efficient Document Retrieval with Vision Language Models

Visual PDF Reader: ColPALI for RAG #ai

ColPali The Future of Document Indexing with Vision Language Models | Srinivasan Ramanujam | AI

From PDFs to Pixels: How ColPali is Changing Information Retrieval | S2 E7

Multimodal RAG with Qwen-2 and ColPali: Ask Questions from Images 🔥

Vision-Based RAG System For Complex Documents

From PDFs to Pixels: How ColPali is Changing Information Retrieval | S2 E7

Multimodal RAG using ColPali (with Byaldi) and VLM

ColPali Revolutionizing Information Retrieval | Detailed Explanation in Hindi | #generativeai

Will the New GEMINI PDF Feature Replace RAG?

10 AI papers you should read for September 2024

Next-gen reasoning with OpenAI's o1 (& much more) | Trends in AI - September 2024

Try this Before RAG. This New Approach Could Save You Thousands!

Multi-modal RAG: Chat with Docs containing Images

Argilla Community Everything image: from fine-tuning CLIP models to synthetic image datasets

OpenAI O1 Can Reason—But How Good Is It?

South Bay Unstructured Data Meetup Sep 17 2024