filmov
tv
Install and Run Locally in Python Llama 3.2 1B and 3B LLM Models on Windows From Scratch!
Показать описание
#llama3.2 #llama3 #llama3.1 #machinelearning #computervision
In this tutorial, we explain how to download, install and use Llama 3.2 1B and 3B Large Language Models (LLMs) in Python on a local Windows computer. Llama 3.2 1B and 3B are lightweight models that can be efficiently executed on desktop computers with modest hardware as well as on edge devices. As such, they are very attractive for building local AI applications, internet of things AI applications, and local RAG applications.
There are at least two approaches for running Llama 3.2 locally in Python:
1.) Download and use Llama 3.2 models by using the Ollama framework and the Ollama Python library. We covered this approach in our previous tutorial given here:
2.) Download and use Llama 3.2 models by directly downloading the models from the Huggingface website and by running a local Python script. This approach is covered in this tutorial.
Prerequisites:
- You need to have around 10 GB of space on your disk to download both models and to install all the Python libraries necessary to run these models.
- We have tested the models on a computer with NVIDIA 3090 GPU which has 24GB VRAM. It is a two-year old computer with 48GB of RAM. The model should work on computers with less powerful GPUs and with less RAM memory.
- Make sure that you have Microsoft Visual Studio C++ Compilers installed on your system. Otherwise, you might not be able to install PyTorch properly. To install C++ compilers, download and install Microsoft Visual Studio with C++ Community Edition from this link:
- As a demonstration, in this tutorial, we use instruct models. However, everything explained in this tutorial can be used for base or chat models. Instruct models are trained and used by using instructions. We give instructions on how to respond or what style of responses we want to receive. While generating the response, the model follows the instructions.
- Copyright notice: This tutorial should not be reposted, re-uploaded, or downloaded without the permission of Aleksandar Haber. Then, this tutorial should not be used as official or unofficial lecture material in university courses or on online learning platforms. Finally, this manual should not be reposted on public or private websites.
In this tutorial, we explain how to download, install and use Llama 3.2 1B and 3B Large Language Models (LLMs) in Python on a local Windows computer. Llama 3.2 1B and 3B are lightweight models that can be efficiently executed on desktop computers with modest hardware as well as on edge devices. As such, they are very attractive for building local AI applications, internet of things AI applications, and local RAG applications.
There are at least two approaches for running Llama 3.2 locally in Python:
1.) Download and use Llama 3.2 models by using the Ollama framework and the Ollama Python library. We covered this approach in our previous tutorial given here:
2.) Download and use Llama 3.2 models by directly downloading the models from the Huggingface website and by running a local Python script. This approach is covered in this tutorial.
Prerequisites:
- You need to have around 10 GB of space on your disk to download both models and to install all the Python libraries necessary to run these models.
- We have tested the models on a computer with NVIDIA 3090 GPU which has 24GB VRAM. It is a two-year old computer with 48GB of RAM. The model should work on computers with less powerful GPUs and with less RAM memory.
- Make sure that you have Microsoft Visual Studio C++ Compilers installed on your system. Otherwise, you might not be able to install PyTorch properly. To install C++ compilers, download and install Microsoft Visual Studio with C++ Community Edition from this link:
- As a demonstration, in this tutorial, we use instruct models. However, everything explained in this tutorial can be used for base or chat models. Instruct models are trained and used by using instructions. We give instructions on how to respond or what style of responses we want to receive. While generating the response, the model follows the instructions.
- Copyright notice: This tutorial should not be reposted, re-uploaded, or downloaded without the permission of Aleksandar Haber. Then, this tutorial should not be used as official or unofficial lecture material in university courses or on online learning platforms. Finally, this manual should not be reposted on public or private websites.
Комментарии