Lessons learned deploying a LLM document processing system at scale

Показать описание

Vendr and Extend built an LLM document processing system to analyze more than 3 million pages from 100,000 documents across 20+ categories, from highly unstructured sales contracts to 50-page legal agreements. This session shares lessons learned from transforming these unstructured documents into structured data:
· Different techniques for reliably using LLMs for accuracy-intensive use cases, including LLM confidence signals, logprobs, data validations, and human-in-the-loop tooling
· Using evals to determine the best model for the job, and when you should use OpenAI vs Anthropic vs open-source models
· How to improve performance over time via prompt optimizations, fine-tuning, and few-shot feedback loops
· Challenges overcome in mapping LLM outputs into a structured data catalog
· Employing text embeddings and targeted data reviews to build a trustworthy, high-quality dataset
By the end of this presentation, you’ll be armed with knowledge to deliver an LLM document processing system at scale – and get a glimpse into the future of unstructured data processing.

This session was presented by Mark Andersen, VP, Data Science & Analytics at Vendr, Stefan Jol, Director of Machine Learning at Vendr and Kushal Byatnaik, CEO at Extend

Connect with us:
Twitter: @johnsnowlabs