Using Ollama to Run Local LLMs on the Raspberry Pi 5

preview_player
Показать описание
My favourite local LLM tool Ollama is simple to set up and works on a raspberry pi 5. I check it out and compare it to some benchmarks from more powerful machines.

00:00 Introduction
00:41 Installation
02:12 Model Runs
09:01 Conclusion

Support My Work:

Gear:

As an affiliate I earn on qualifying purchases at no extra cost to you.
Рекомендации по теме
Комментарии
Автор

I just got a RPi 5 and ran the new Llama 3 (ollama run llama3).
I was not expecting it to be this fast for something that is on the level of GPT-3.5 (or above). On a Raspberry Pi. Wow.

metacob
Автор

I've trie drunning OLLAMA on my Raspberry Pi 5, as well as an Intel Celeron based computer, and also an old Intel i7 based computer, and it worked everywhere. It is really behind impressive, thank you for this video to show me how to do it!

sweetbb
Автор

Thank u for sharing this. I am blown away.

KDG
Автор

Such a calm tutorial but so informative💙

nilutpolsrobolab
Автор

As I just said on the discord server : you might be able to squeeze a (very) tiny bit of performance by not loading the WM and just interact with ollama via SSH. But great that it works as well with tinyllama! Phi based models might work well too! Dolphin-Phi is a 2.7B model.

SocialNetwooky
Автор

I've been testing llamacpp on it and it works great as well. Although, I've had to use my air purifier as a fan to keep it from overheating even with the aftermarket cooling fan/heatsync on it.

isuckatthat
Автор

Really useful stuff on your videos. Subscribed 👍

markr
Автор

Thanks, Ian. Can confirm. It works and is plausible. I am getting about 8-10 minutes for multi-modal image processing with Llava. I find the tiny models to be too dodgy for good responses, and have currently settled on Llama2-uncensored as my go to LLM for the moment. Response times are acceptable, but looking for better performance. (BTW my Pi5 is using an nVME drive and a Hat from Pineberry)

whitneydesignlabs
Автор

Thanks for this. So far I've tested TinyLlama, Llama2, and Gemma:2b with the question "Who's on first" ( a baseball reference from a classic Abbott and Costello comedy skit). TinyLlama and Llama2 understood that it was a baseball reference, but had some bizarre ideas on how baseball works. Gemma:2b didn't understand the question but when asked "What is a designated hitter?" came up with an equally incorrect answer.

BillYovino
Автор

Thanks for video and testing. I was wondering if you have tried setting num_threads =3. I can't find video of where I saw this but I think they set it before calling ollama. Like environment variable. It's supposed to run faster. I'm just building a rpi5 test station now

donmitchinson
Автор

Thanks for the video! What's your camera please ?

mek
Автор

Might be worth trying the quantised versions of llama2

MarkSze
Автор

Been having fun running different LLM. The small ones are fast, the 7B ones are slow. I have Pi5 8G. The small LLMs should run on a Pi4? Tinyllama has trouble adding 2+2. They also seem Monotropic, spiting out random vaguely related answers. I need more Pi5 so I can network a bunch with different LLM on each.

AlwaysCensored-xpbe
Автор

How do we run this in python, so for voice to text and text to speech for a voice assistant

Augmented_AI
Автор

Would love to know if the google coral board would provide a substantial improvement. If Ollama can even utilize that. Also, how it would compare to a jetson nano. Nonetheless: Thank you very much for posting this. Chirps to the Birds ❤️

dinoscheidt
Автор

In the USA Digilent also has many Raspberys5 available!

Vhbaske
Автор

The Pi 5 is pretty good when you consider the cost, and what you can do with it. I picked one up recently for Python coding, and it runs Jupyter Notebook beautifully on my 4k screen. I might give the GPIO a whirl at some point in the near future.

daveys
Автор

Will the performance improve by adding AI accelerator like hailo 8

vishwanathasardeshpande
Автор

could the compute process be distributed, like a grid compute? 4 raspberry pi?

tube
Автор

I finally got my Pi5 yesterday and already have ollama working with a couple of models. But id like to provide a text to speech for the output on the screen having a hard time wrapping my brain around it how it works... like allowing the Ollama functions from the terminal to turn into audible speech.. but so many resources too pick from and also just getting the code/scripts working, i wish it was easy to install an external package and allow the internal functions to just "work" without having to move files and scripts around it becomes confusing sometimes.

BenAulbrook