Run Any Local LLM Faster Than Ollama—Here's How

preview_player
Показать описание
I'll demonstrate how you can run local models 30% to 500% faster than Ollama on CPU using Llamafile. Llamafile is an open-source project from Mozilla with a permissive license that turns your LLMs into executable files. It works with any GGUF model available from Hugging Face. I've provided a repository that simplifies the Llamafile setup to get you up and running quickly.

Рекомендации по теме
Комментарии
Автор

please do more content. loved the channel. subscribed

johnbox
Автор

I've not tried this yet, but really well put together video. Thanks!

rikmoran
Автор

Great video, llamafile is really interesting, it adds a lot of flexibility for deployment options. Can't wait to start seeing the various ways people are leveraging this.

brinkoo
Автор

Are you planning on updating Jar3d any time soon? I would like it to work locally with gpu with ollama, i don't want to spend time modifying it if you have already done it and more.

nedkelly
Автор

John, this is great!
Could you share a quick vid on integrating this into apps to replace ollama via API. Also a vid on how we can include a GPU via this method would be great. Thanks - and keep it up!

SejalDatta-lu
Автор

amazing. I had no idea about this. thank you! I'll check it out tonight. How's the Jer3d project going?

ashgtd
Автор

How is this different than using hugging face models on ollama? I see nothing in this video where this makes anything faster

ChrisSteurer
Автор

I have an i5 @3.3GHz (4cores). I think I can reach 4.2Ghz overclocked.
And an 8gb AMD R9 200series GPU.
Is it possible to run ollama & train my own LLMs?
Everywhere seems to recommend a min of 16gb, so I haven't spent the time.

IJH-Music
Автор

I cannot believe ollama eould be even slower than that

vertigoz
Автор

@Data-Centric Hi, I need your suggestion for below:
Want to build a workflow automation using Multi-Agent Framework. For Example Insurance claim workflow which has Agents (Raise New claim, validate policy, validate customer, determine payout, approve, deny). Whereas we have to implement these individual agents in our own BPMN workflow which will be exposed as APIs and We need a best multi-Agent Framework to orchestrate these Agents (by calling these agents via API as tools). Which is best-fit multi-Agent framework (LangGraph, CrewAI, AutoGen)? We are looking for hybrid approach (Individual Agents like 'Raise New Claim' implementation will be in our own APIs and Supervisor Agent will be on one of these Framework to orchestrate these Agents. Please advice.

bvinodmca
Автор

Are people really using CPU for inference?

BenjaminK
Автор

i'll try that out on my amd 100gb ram, hopefully running the larger 20gb+ will give this a perf boost

themaxgo