How to run LLM locally with ollama | Python example

preview_player
Показать описание
My colleagues from IBM advised to try ollama, because this tool is super useful if you want to run and use LLM models locally. Just download the LLM model quantized GGUF file with ollama, prepare Python file for inference, custom it based on your needs and use!

While recording this video I have never had experience with ollama before. So, I demonstrated very basics of ollama, including the usage of website and enhancing the Python file from ollama repo to have direct inference to local server LLM model.

In this example I used llama3 and llama3-chatqa (compatible with RAG) LLM models. This requires to have ollama CLI (command-line interface) installed on your PC. All steps are demonstrated within the tutorial.

About GGUD format:
GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML. This is like compressed model version in terms of size on disk.

I definitely will use ollama for a series of LLM-based tasks and experiments in future. Do you have your own experience on that? Let me know in the comments! Thanks!

- - - - - - - -
References mentioned and showed in the video:

#ollama #huggingface #llm

- - - - - - - -
Content of the video:
0:00 - Intro
0:33 - ollama website
1:27 - Download ollama CLI to Mac
2:18 - Test ollama CLI first time with llama3 (pull the model)
3:24 - Take llama3-chatqa model for the 2nd experiment
4:03 - Download and run llama3-chatqa model locally
5:27 - Using ollama Github repo
5:55 - Simple Generate with Python example
7:04 - Check ollama ports for running locally
7:38 - Test llama3-chatqa LLM model with CLI
8:11 - Final word
Рекомендации по теме
Комментарии
Автор

Thank you for watching this video! If you liked that, I hope the related content from @DataScienceGarage would be useful for you as well:

Thank you!

DataScienceGarage