filmov
tv
LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?
Показать описание
We dive deep into the world of GPTQ 4-bit quantization for large language models like LLaMa. We'll explore the mathematics behind quantization, immersion features, and the differential geometry that drives this powerful technique. We'll also demonstrate how to use the GPTQ 4-bit quantization with the Llama library. This video is a must-watch if you're curious about optimizing large language models and preserving emergent features. Join us as we unravel the mysteries of quantization and improve our understanding of how large language models work! Don't forget to like, subscribe, and tell us what you'd like to learn about next in the comments.
#GPTQ4Bit #Quantization #LargeLanguageModels #NeuralNetworks #Optimization #EmergentFeatures #LlamaLibrary #DeepLearning #AI #optimization #EmergentFeatures #LlamaLibrary #DeepLearning #ai
0:00 Intro
0:33 What is quantization?
2:17 Derivatives and the Hessian
4:03 Emergent features
5:17 GPTQ 4-Bit quantization process
8:40 Using GPTQ-for-LLaMa
10:50 Outro
#GPTQ4Bit #Quantization #LargeLanguageModels #NeuralNetworks #Optimization #EmergentFeatures #LlamaLibrary #DeepLearning #AI #optimization #EmergentFeatures #LlamaLibrary #DeepLearning #ai
0:00 Intro
0:33 What is quantization?
2:17 Derivatives and the Hessian
4:03 Emergent features
5:17 GPTQ 4-Bit quantization process
8:40 Using GPTQ-for-LLaMa
10:50 Outro
Комментарии