filmov
tv
🔥🚀 Inferencing on Mistral 7B LLM with 4-bit quantization 🚀 - In FREE Google Colab

Показать описание
🔥🚀 Inferencing on Mistral 7B with 4-bit quantization 🚀 | | Large Language Models
I explain the BitsAndBytesConfig in detail
📌 Max System RAM is only 4.5 GB and
📌 Max GPU VRAM is 5.9 GB
👉 **`load_in_4bit` parameter** is for loading the model in 4 bits precision
This means that the weights and activations of the model are represented using 4 bits instead of the usual 32 bits. This can significantly reduce the memory footprint of the model. 4-bit precision models can use up to 16x less memory than full precision models and can be up to 2x faster than full precision models.
However, if you need the highest possible accuracy, then you may want to use full precision models.
-------------------
🔥🐍 Check out my new Python Book - where I cover, 350+ Python Core Fundamental concepts, across 1300+ pages needed in daily real-life problems for a Python Engineer.
For each of the concepts, I discuss the 'under-the-hood' view of how Python Interpreter is handling it.
-----------------
----------------
You can find me here:
**********************************************
**********************************************
Other Playlist you might like 👇
----------------------
#LLM #Largelanguagemodels #Llama2 #opensource #NLP #ArtificialIntelligence #datascience #langchain #llamaindex #vectorstore #textprocessing #deeplearning #deeplearningai #100daysofmlcode #neuralnetworks #datascience #generativeai #generativemodels #OpenAI #GPT #GPT3 #GPT4 #chatgpt
Комментарии