filmov
tv
Meetup: Evaluating LLMs: Needle in a Haystack
Показать описание
LLM evaluation is a discipline where confusion reigns and foundation model builders are effectively grading their own homework.
Building on the viral threads on X/Twitter, Greg Kamradt, Robert Nishihara, and Jason Lopatecki discuss highlights from Arize AI's ongoing research on how major foundation models – from OpenAI’s GPT-4 to Mistral and Anthropic’s Claude – are stacking up against each other at important tasks and emerging LLM use cases, covering and explaining the importance of results of Needle in a Haystack tests and other evals results on hallucination detection on private data, question-and-answer, code functionality, and more.
Curious which foundation models your company should be using for a specific use case – and which to avoid? You won’t want to miss this meetup!
Building on the viral threads on X/Twitter, Greg Kamradt, Robert Nishihara, and Jason Lopatecki discuss highlights from Arize AI's ongoing research on how major foundation models – from OpenAI’s GPT-4 to Mistral and Anthropic’s Claude – are stacking up against each other at important tasks and emerging LLM use cases, covering and explaining the importance of results of Needle in a Haystack tests and other evals results on hallucination detection on private data, question-and-answer, code functionality, and more.
Curious which foundation models your company should be using for a specific use case – and which to avoid? You won’t want to miss this meetup!
Meetup: Evaluating LLMs: Needle in a Haystack
Evaluating Retrieval in RAGs - Maria Knorps | WiMLDS Poznań 23rd Meetup, Fandom Office
Holistic Evaluation of Generative AI Systems // Jineet Doshi // MLOps Podcast #280
Anthropic 2024 Updates including Claude 3 + GenAI Observability and LLM Evaluation with Truera
Make TechTalks: From Data to Intelligence: Embedding Company Knowledge in LLMs
Learning at test time in LLMs
Gen AI Journey to Production - Expert Panel
LLM Evaluation with Arize AI's Aparna Dhinakaran // MLOps Podcast #210
Navigating the AI Frontier // Boris Selitser // MLOps Podcast #241
Gen AI London - LLM Agents For the Enterprise
Prototyping with Generative AI with Ben Lerner - nyhackr February Meetup
CLIQ-ai.quebec NLP Meetups - December 2024
[80] Solving NLP (Natural Language Processing) Tasks Using Chat GPTs & LLMs (Large Language Mode...
Yuandong Tian | Efficient Inference of LLMs with Long Context Support
SQL Generation Evals: LLMs-as-a-Judge
AI Meetup (Dallas) 7/23/2024
AI Agents for Data Analysis with Shreya Shankar - 703
All the Hard Stuff with LLMs in Product Development // Phillip Carter // MLOps Podcast #170
How Far Can We Scale AI? Gen 3, Claude 3.5 Sonnet and AI Hype
The Needle in the Haystack: Harnessing the Power of AI to Identify the Sickest Cancer Patients
Computer Vision Meetup: Fast and Flexible Data Discovery & Mining at Petabyte Scale
Austin Deep Learning Meetup - Llama 3 Candidate Paper | Self-Rewarding Language Models
SF Unstructured Data Meetup May 21 2024
Boosting AI with Python: Using Click, Jinja2, and GPT Libraries
Комментарии