RouteLLM Tutorial - GPT4o Quality but 80% CHEAPER (More Important Than Anyone Realizes)

preview_player
Показать описание
Full tutorial for how to use RouteLLM.

(North America only)

Join My Newsletter for Regular AI Updates 👇🏼

My Links 🔗

Need AI Consulting? 📈

Media/Sponsorship Inquiries ✅

Links:
Рекомендации по теме
Комментарии
Автор

I'm putting the final touches on my business idea presentation that I'm going to give away, which is partially inspired by RouteLLM.

Can you guess what it is? 🤔

(North America only)

matthew_berman
Автор

WOW. This is crazy. I had a similar idea last week and mentioned to my team this morning 10AM UK time. I’ve just seen this video and it blows my mind that our thoughts were so aligned. And yes. I’m not sure if people really got it together, and Agentic on device system + Routing LLM technology is the future. I believe models will become smaller and smaller. Crazy how this part of tech is advancing so fast. It is as scary as it’s exiting.
I learn a lot from your videos that forces me to read papers and try some of the tech for myself. I really appreciate your content.

negozumbi
Автор

Awesome video, with GPT4o mini though idk how much I'll be routing lol

jaysonp
Автор

💥 Matthew Berman is so clear and concise. He has that natural talent for explaining things in a way that everybody understand, with emphasis on all phrases, clear diction, intonation, that hooks the listener. People like David Shapiro, Matt Wolf, Matthew Berman, who speaks only the necessary, and value every phrase. This is great. 🎉❤❤❤

DihelsonMendonca
Автор

Thank you, I'll try this out soon! ❤

jeroenvandongenj-works
Автор

Is the context preserved across models for subsequent queries?

E.g.
Where is country x? - goes to the weak model
What is good to eat there?- goes to the strong model

Does the strong model have the context of country X?

sinukus
Автор

I can see this being useful in scenarios where you have local Phi-3 models, then escalating to GPT4-Mini.
If there was some way of declaring what each SLM is best suited for, then RouteLLM could kick it to the expert that knows best.
Great video, as always Matthew! Cheers from Australia!

JaredWoodruff
Автор

That's the way to go! Great and thanks for sharing!

punk
Автор

Thank you! Matthew. I like your video very much.

zhengzhou
Автор

another game change project, excited for the next video

laiscott
Автор

I did not expect llamas to be so important in my life, a person I know has a llama as a pet too

netherportals
Автор

Sam Altman already mentioned that, in addition to trying to improve the reasoning ability of the models, another aspect they were working on was a mechanism capable of discerning the complexity of a task to allocate more or fewer resources accordingly. Currently, LLMs always use all resources regardless of the task, which is wasteful.

TheRubi
Автор

If Llama 3.1 is looking at "9.11" > "9.9" as a String data type and not as a Float data type, 9.11 is larger because it contains more characters.

LuckyWabbitDesign
Автор

I was actually thinking to use it the other way around. Route LLM to have a strong cheap model and a weak even cheaper model and then add it to MOA together with strong frontier models to aggregate the responses so you could potentially build something that is slightly better than gpt4-o while still reducing your cost a bit. Another step would be to also chain it together with memgpt to have a longterm memory. And then use the endpoint in aider as the ultimate coding assistant 😅

richardkuhne
Автор

Matt: you need to discuss how the router logic works. I am nervous that it will not be smart enough for my niche business use cases

magnusbrzenk
Автор

This could work really well with locally run models, especially if using specialised models for given tasks, having multiple models, each one good in their own respected area and much lighter on the hardware because of not being a jack of all trade and could potentially be a game changer for running much stronger A.I. on our local hardware, especially if there is a way to have it where it does a good job at picking the right model for the given task you ask of it and honestly, storage space is a lot cheaper than vram is, so if A.I. models can be switched in and out of memory on the fly, having a lot of A.I. models on your hard drive that are good in their own area isn't going to be that big of a deal but could boost the quality of A.I. massively at a local level while not needing a crazy amount of vram.

Mind you, all this only works if A.I. models can be switched in and out of memory on the fly and do so quite quickly that from an end user, it all seems like the same A.I. model, this also would only work if you have a master general A.I. model that can do well at delegating to other specialised A.I. models, after all, having a few hundred GB's of A.I. models isn't going to be that big of a deal with how cheap storage is, and it sure would be a lot cheaper and faster to run if they can be switched in and out on the fly.

pauluk
Автор

That is a task not a prompt 😮it is useful for complex tasks, setting up a system prompt and classifying the task level.
Great video 👏

AIPlayGrounds
Автор

Really great stuff, Matthew - thank you! I noticed the prompt was to keep the cost of the call to the MF router api down to 11.5c - does this mean the router llm uses tokens per cost or does that run locally?

jamesyoungerdds
Автор

Matthew, can you do a deep dive on how to choose hardware for home use? Can we use Linux? And also, what hardware choice might we make if we wait? Should we wait? Should we buy now to avoid a signed GPU? Etc.

modolief
Автор

🙌 How nice would be if gpt4 had the speed of groq. Thanks for the video Matthew

cristian