Addressing Latency Challenges in Large Language Models

preview_player
Показать описание
Saahil Jain, discusses the challenges associated with latency in large language models. He highlights that the models generate tokens one at a time, making it difficult to improve the time it takes to process the input. Jain notes that there is a lot of research aimed at finding more efficient ways to generate tokens. Jain emphasizes the importance of reducing latency to improve the user experience, particularly in search applications, and distinguishes between actual latency and perceived latency.

MLOps Coffee Sessions #150 with Saahil Jain, The Future of Search in the Era of Large Language Models, co-hosted by David Aponte.

// Abstract

Saahil also discusses the intersection of traditional information retrieval and generative models and the trade-offs in the type of outputs they produce. He suggests occupying users' attention during long wait times and the importance of considering how users engage with websites beyond just performance.

// Bio

Previously, Saahil was a graduate researcher in the Stanford Machine Learning Group under Professor Andrew Ng, where he researched topics related to deep learning and natural language processing (NLP) in resource-constrained domains like healthcare. His research work has been published in machine learning conferences such as EMNLP, NeurIPS Datasets & Benchmarks, and ACM-CHIL among others. He has publicly released various machine learning models, methods, and datasets, which have been used by researchers in both academic institutions and hospitals across the world, as part of an open-source movement to democratize AI research in medicine. Prior to Stanford, Saahil worked as a product manager at Microsoft on Office 365.

He received his B.S. and M.S. in Computer Science at Columbia University and Stanford University respectively.

// MLOps Jobs board

// MLOps Swag/Merch

// Related Links
Retrieval augmented models papers:

--------------- ✌️Connect With Us ✌️ -------------
Follow us on Twitter: @mlopscommunity

Рекомендации по теме