Run Llama 3 on CPU using Ollama

preview_player
Показать описание
Discover how to effortlessly run the new LLaMA 3 language model on a CPU with Ollama, a no-code tool that ensures impressive speeds even on less powerful hardware.

Don't forget to like, comment, and subscribe for more tutorials like this!

Join this channel to get access to perks:

To further support the channel, you can contribute via the following methods:

Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW

#llama3 #llama #ai
Рекомендации по теме
Комментарии
Автор

I think you are one of the best channels to learn about AI. Thanks for keeping us up to speed with such fast moving tech!

johnbarros
Автор

Please do mention the specs of your machine. or add them to your video description. Thanks for posting your vids.

laalbujhakkar
Автор

To those who are getting responses very delayed...let me tell you that llama3 usually runs on GPU or higher tuned CPU .
These llm models generate on the basis of GPU performances...
And that's why the computation power matters in case of generation irrespective of the llm you use
So, its better to find any way to import model using API instead of downloading locally

pragyantiwari
Автор

I have tried the model but it does not respond as quickly as shown in the video. Nevertheless, it keeps responding. Thank you for sharing your knowledge and congratulations.

CesarVegaL
Автор

Is there any way to do rag on cpu but really fast like maybe not fast as groq but couple of second is fine? I want to only use like maximum 3 billion parameter model only since I don't think it will be fast if using 7 billion model..

Cingku
Автор

I run summarization in batches of text up to 4K, with langchain, and this model is quite slow on my machine. Gemma:2b takes 1/6 of the time to summarize the same amount.
So, while I like llama3 for local inference, it is a bit too slow for actual work.
Perhaps if somebody quantized it to 2b, it would be a competitor to gemma:2b, but it should also be said that Gemma models are made to run on low spec hardware, and were trained accordingly I thnk, while llama 3 is a more general purpose model.

aldotanca
Автор

can you build a project so that user provides a sentence and a word and llm should provide the entire dictionary lookup for the word in context to sentence

atulanand
Автор

How do we integrate private pdf into it. I would be very happy if you could give the simplest video about creating a chat with pdf using ollama 3 on cpu. I went through your previous video but was not able make it work in windows

prestocranius
Автор

Sri we want video fine-tuning the the Gemini pro 1.5 video and used rag

velugucharan
Автор

What is the full specs of your pc? How many cores is your cpu?

elikyals
Автор

Tell me about gpu bhai, its real slow in cpu

marufhoque
Автор

I think its not running localy bro, i would like to doit but if i disconect my laptop from internet it stop working

CarlosGomez-fjbz
Автор

Do you think i can run llama3 70b q8 model if i have 128gb ram with 3060 12gb vram?

simerosaitora
Автор

To all cpu users don't fall for it, it will work but slow and don't worth doing it, just use online api's or groq or colab

shivpawar
Автор

please it's not LAInux neither LYnux, its LInux

joseaugustodossantossilva