How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team

Показать описание

See below for guest bio, links, and to give feedback, submit questions, contact Lex, etc.

*GUEST BIO:*
Aman Sanger, Arvid Lunnemark, Michael Truell, and Sualeh Asif are creators of Cursor, a popular code editor that specializes in AI-assisted programming.

*CONTACT LEX:*

*EPISODE LINKS:*

*SPONSORS:*
To support this podcast, check out our sponsors & get discounts:
*Encord:* AI tooling for annotation & data management.
*MasterClass:* Online classes from world-class experts.
*Shopify:* Sell stuff online.
*NetSuite:* Business management software.
*AG1:* All-in-one daily nutrition drinks.

*PODCAST LINKS:*

*SOCIAL LINKS:*

Рекомендации по теме

Комментарии

See below for guest bio, links, and to give feedback, submit questions, contact Lex, etc.

*GUEST BIO:*
Aman Sanger, Arvid Lunnemark, Michael Truell, and Sualeh Asif are creators of Cursor, a popular code editor that specializes in AI-assisted programming.

*CONTACT LEX:*

*EPISODE LINKS:*

*SPONSORS:*
To support this podcast, check out our sponsors & get discounts:
*Encord:* AI tooling for annotation & data management.
*MasterClass:* Online classes from world-class experts.
*Shopify:* Sell stuff online.
*NetSuite:* Business management software.
*AG1:* All-in-one daily nutrition drinks.

*PODCAST LINKS:*

*SOCIAL LINKS:*

LexClips

Imagine human behaviour was relative to the time zones they lived in and you could guarantee 5 Ti gpus running in each household every morning to heat water for their showers like Santa flying around the world doing computations while heating on demand 😂

ValidatingUsername

I really like the channel but I had to unsubscribe because my feed is absolutely flooded with short clips from this channel making it annoying as fuck to find anything else

saintsplenetic

How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team

How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team

All You Need To Know About Running LLMs Locally

RUN LLMs on CPU x4 the speed (No GPU Needed)

FREE Local LLMs on Apple Silicon | FAST!

Ollama: Run LLMs Locally On Your Computer (Fast and Easy)

How to Fine-Tune and Train LLMs With Your Own Data EASILY and FAST- GPT-LLM-Trainer

Mamba Might Just Make LLMs 1000x Cheaper...

Using Clusters to Boost LLMs 🚀

Revolutionizing Transportation: Advanced LLM Routing with RouteLLM

Run Local LLMs on Hardware from $50 to $50,000 - We Test and Compare!

LLMs with 8GB / 16GB

Run LLMs locally with LMStudio

I Ran Advanced LLMs on the Raspberry Pi 5!

Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)

Fast ReActions: Planning and Reasoning Quickly with LLMs

Speculative Decoding: When Two LLMs are Faster than One

The EASIEST way to RUN Llama2 like LLMs on CPU!!!

How ChatGPT Works Technically | ChatGPT Architecture

How to Fine-Tune and Train LLMs With Your Own Data EASILY and FAST With AutoTrain

Run LLMs On Your Phone Locally - Easy & Fast Install

How might LLMs store facts | DL7

Fine-tuning Large Language Models (LLMs) | w/ Example Code

What are Large Language Models (LLMs)?

PyTorch in 100 Seconds