Addressing Latency Challenges in Large Language Models

Показать описание

Saahil Jain, discusses the challenges associated with latency in large language models. He highlights that the models generate tokens one at a time, making it difficult to improve the time it takes to process the input. Jain notes that there is a lot of research aimed at finding more efficient ways to generate tokens. Jain emphasizes the importance of reducing latency to improve the user experience, particularly in search applications, and distinguishes between actual latency and perceived latency.

MLOps Coffee Sessions #150 with Saahil Jain, The Future of Search in the Era of Large Language Models, co-hosted by David Aponte.

// Abstract

Saahil also discusses the intersection of traditional information retrieval and generative models and the trade-offs in the type of outputs they produce. He suggests occupying users' attention during long wait times and the importance of considering how users engage with websites beyond just performance.

// Bio

Previously, Saahil was a graduate researcher in the Stanford Machine Learning Group under Professor Andrew Ng, where he researched topics related to deep learning and natural language processing (NLP) in resource-constrained domains like healthcare. His research work has been published in machine learning conferences such as EMNLP, NeurIPS Datasets & Benchmarks, and ACM-CHIL among others. He has publicly released various machine learning models, methods, and datasets, which have been used by researchers in both academic institutions and hospitals across the world, as part of an open-source movement to democratize AI research in medicine. Prior to Stanford, Saahil worked as a product manager at Microsoft on Office 365.

He received his B.S. and M.S. in Computer Science at Columbia University and Stanford University respectively.

// MLOps Jobs board

// MLOps Swag/Merch

// Related Links
Retrieval augmented models papers:

--------------- ✌️Connect With Us ✌️ -------------
Follow us on Twitter: @mlopscommunity

Рекомендации по теме

Addressing Latency Challenges in Large Language Models

Addressing Latency Challenges in Large Language Models

How to Solve Interactive Streaming’s Latency Challenges

What is latency? What affects latency?

Addressing the Challenges of Low-Latency, High-Performance Wi-Fi -- Laird, Infineon & Mouser

Jointly Optimize Capacity, Latency and Engagement in Large-scale Recommendation Systems

LATENCY: Addressing the Challenges in Cloud & Network-Centric Games

Raluca Ionescu - Addressing the impact of information latency - DevWorld 2024

The strategic guide to Low Latency | Webinar

Computer Architecture - Lecture 25: Cutting-Edge Research in Computer Architecture I (Fall 2024)

Marina Kalkanis - The challenges of deploying Apple's Low Latency HLS In Real Life

Sponsor Presentation: Using DNS to solve the latency challenge

Beyond Latency: Key Streaming Challenges of Microbetting and iGaming

Latency & Live Streaming At Scale

TUE2. Latency Still Sucks (and What You Can Do About It)

Nexus Ultra-Low Latency Solutions Accelerate High Frequency Trading

PANEL: The Future of High Speed, Low Latency, Large Capacity Networks and the Massive IoT

Amnic Cast Short: How to Deal with Network Latency while Architecting At Scale?

Network Latency and TCP at 40G and 100G

Bandwidth vs. Throughput vs. Latency | Computer Networks

Lessons Learned: Low Latency Ingest | Adam Roach

Architecting Low-Latency Java Systems at Massive Scale for Sydney Java Talk | Java Meetup

Optimizing QoE and Latency of Live Video Streaming Using Edge Computing and In-Network Intelligence

ITS JUST ONE SETTING ⚙️! Improve FPS And Reduce Latency ⚡ By Changing ONE Simple Setting! Link 👇...

Tail Latency Meets Caching - An Unusual Alliance