filmov
tv
Mathematics and Science of Large Language Models (Ernest Ryu, UCLA Applied Math Colloquium)
Показать описание
UCLA Applied Math Colloquium, Ernest Ryu, Oct 31, 2024.
Title:
Mathematics and Science of Large Language Models
Abstract:
Large language models (LLMs) represent an engineering marvel, but their inner workings are notoriously challenging to understand. In this talk, we present two analyses of LLMs. The first result is a mathematical guarantee on LoRA fine-tuning for LLMs, showing that the training dynamics almost surely experience no spurious local minima if a LoRA rank $r\gtrsim\sqrt{N}$ is used, where $N$ is the number of fine-tuning data points. The second result is a scientific analysis of the training dynamics of in-context learning (ICL), showing that training on multiple diverse ICL tasks simultaneously \emph{shortens} the loss plateaus, making each task easier to learn.
Title:
Mathematics and Science of Large Language Models
Abstract:
Large language models (LLMs) represent an engineering marvel, but their inner workings are notoriously challenging to understand. In this talk, we present two analyses of LLMs. The first result is a mathematical guarantee on LoRA fine-tuning for LLMs, showing that the training dynamics almost surely experience no spurious local minima if a LoRA rank $r\gtrsim\sqrt{N}$ is used, where $N$ is the number of fine-tuning data points. The second result is a scientific analysis of the training dynamics of in-context learning (ICL), showing that training on multiple diverse ICL tasks simultaneously \emph{shortens} the loss plateaus, making each task easier to learn.