Inference, Serving, PagedAtttention and vLLM

preview_player
Показать описание
GPT-4 Summary: Dive into the future of Large Language Model (LLM) serving with our live event on vLLM, the groundbreaking open-source inference engine designed to revolutionize how we serve and perform inference on LLMs. We'll start with a clear explanation of the basics of inference and serving, setting the stage for an in-depth look at vLLM and its innovative PagedAttention algorithm. This event promises to unveil how vLLM overcomes memory bottlenecks to deliver fast, efficient, and cost-effective LLM serving solutions. Expect a detailed walkthrough of vLLM's system components, a compelling live demo complete with code, and a forward-looking discussion on vLLM's place in the 2024 AI Engineering workflow. Whether you're battling with the load and fine-tuning challenges of current LLMs or looking for scalable serving solutions, this is a must-watch to stay ahead in the field of AI and machine learning.

Have a question for a speaker? Drop them here:

Speakers:
Dr. Greg, Co-Founder & CEO

The Wiz, Co-Founder & CTO

Join our community to start building, shipping, and sharing with us today!

Apply for our next AI Engineering Bootcamp on Maven today!

How'd we do? Share your feedback and suggestions for future events.
Рекомендации по теме
Комментарии
Автор

very nice lecture. it is totally super clear!

kged
Автор

Awesome work guys! I love the RAG analogy to KQV self-attention inner workings. Thanks for sharing this content for free to the eager ML community.❤

marloncajamarca
Автор

Trying your collab but got error:ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

lilia
Автор

How different these serving framweworks like vllm, rayserve, openly are from what langchain is offering as "langserve"? if my model is hosted at some place and I just access it with my api key and api url, which way to go?

NavjotMakkar
Автор

Holy shot it’s been 10 minutes and it’s just about analogies, nomenclature. Its being dumbed down too much

pepaw