Tim Dettmers—k-bit Inference Scaling Laws

preview_player
Показать описание
Tim is a PhD student at the University of Washington working on representation learning, and hardware optimized deep learning. In this presentation, Tim presents his ICML poster on "k-bit inference scaling laws", or why using 4-bit inference for Large Language Models is optimal for k-bit zero-shot performance.

Рекомендации по теме
Комментарии
Автор

Great video, and the research question is fascinating. I am wondering what happens when doing calculations. Okay, the weights are quantized on k bits, but when we compute attention scores or activations, do we keep the calculations in k bits, or is everything dequantized beforehand, and then quantized again?

yacinegaci
Автор

know him since childhood, was somekind of weird sometimes :D BR

nishcologne