How to run the new Llama 3.1 on Raspberry Pi!!!

preview_player
Показать описание
In this Tutorial, You'll learn how to run Llama 3.1 on Raspberry Pi 5.

We are going to use a method called Llamafile to do run Llama 3.1 on RPi5.

Llamafile is an executable file for distributing LLMs.

As part of this, we'll run Llama 3.1
1. Using GUI
2. Using HTTP Endpoint as a CuRL command!

It's pretty insane how far we have come from needing large GPUs to run LLMs to Raspberry Pi!

Timestamp

00:00 Intro
00:36 Inside Raspberry Pi
02:38 Running LLama 3.1 inside Raspberry Pi
04:40 Calling Llama 3.1 via CuRL (HTTP Endpoint)
06:11 Using Llama 3.1 with Llama CPP GUI

🔗 Links 🔗

❤️ If you want to support the channel ❤️
Support here:

🧭 Follow me on 🧭
Рекомендации по теме
Комментарии
Автор

I was using a pi 4 in a robotics project until recently. Decided I needed an upgrade. Looked at the price of Pi 5 "with all the accessories", and decided to spend the same amount of money on a used Thinkpad T580 with a broken LCD. Runs Llama 3.1 8b at 4 tokens/s. If you don't *need* a pi, there's better options for the money!

I could see qwen or moondream being somewhat useful on a pi 5...

thenoblerot
Автор

I can see even 2 bit Quantisation is giving response preety well.
This can be used for smaller yet effective use cases :
1) Sentiment Anlysis - provide one of the following classification - [positive, negative, netrual]

2) Named entity recognition -
For user feedbacks on product or services, provide what is being discussed in feedback - product name, emotion type, feature name etc.

3) Classify issues into defined categories (billing issue, defect issue, quality issue etc)

ravishmahajan
Автор

Actually raspberry pi is fine but, can you do a video on how to run on mobile

saiashwalkaligotla
Автор

amazing! Can you do fine tuning llama 3.1 8b on a 4090 as my office just bought one.

solidkundi
Автор

Why the used memory was onli 1 MB? Could you reserve more for the process? Or am I not understanding something about the execution?

ShadowElCDJ
Автор

Please forgive my ignorance, but can this setup run without an internet connection? I'm assuming so, but wanted to be sure.

mr_chon
Автор

Have you heard about Raspberry Pi AI Kit Hailo?
So Hailo chip offer 13 TOPS which cost around Rs 10k. It can be used for object recognition.

(not official news) Their next version of chip may support LLM. It would be interesting.

nithinbhandari
Автор

Interesting to see that it can run on an Raspberry Pi.
With 2Bit quantization and that speed it´s unfortunately not really usable 😞

GermanCodeMonkey
Автор

Raspberry Pi 5 Cost with all the accessories ?

__________________________
Автор

3.1 8b is currently bugged it's actually worse for now they have to fix it.

Stealthy_Sloth
Автор

It's okay to make fun of Elon Musk. Don't let the fanboys get to you.

Mephmt
Автор

These days, those raspberry things are way too powerful to make them an example of portability.
Running a big LLM on an old android or a PC with no GPU, no AVX and at decent speed, that'd be a milestone.

ronilevarez
Автор

there is one indian company axion or idk but I got recommended a product from surat like from youtube channel gareeb scientist and it is just 15k rs and its features I found cool I never knew raspberry pie or things like that could too handle ai inference and the gareeb scientist video narrator said that it has NPU present in that indian made raspberrypie like it comes with ubuntu or something I don't exactly remember and you too don't need to watch the video of the guy if you are free sure else get the transcript of video and put into chatgpt - I am writing because I found it cool idk and the inference was too better then my m1 mac 8gb varient which I bought in enthusiasm and I knew I would need more ram like in 2020 and I f*cked up but its fine but I hope we can see the ai evolve and accelerate it doesn't stop would love local agents run things and stuffs --- also gareeb scientist channel isn't the maker or founder of the product

edit: watched your video like with raspberry pie but idk now I am skeptical or confused idk but the inference or tokens per second is highly high like of 7b llma and more! of the other indian product which I am talking about also if the ram of the product would be upgraded then it would have higher inference

MichealScott