Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case

Показать описание

May 23, 2024
Speaker: Loubna Ben Allal, Hugging Face

As large language models (LLMs) become essential to many AI products, learning to pretrain and fine-tune them is now crucial. In this talk, we will explore the intricacies of training LLMs from scratch, including lessons on scaling laws and data curation. Then, we will study the StarCoder use case as an example of LLMs tailored for code, highlighting how their development differs from standard LLMs. Additionally, we will discuss important aspects of data governance and evaluation, crucial elements in today's conversations about LLMs and AI that are frequently overshadowed by the pre-training discussions.

About the speaker: Loubna Ben Allal is a Machine Learning Engineer in the Science team at Hugging Face working on Large Language Models for code & Synthetic data generation. She is part of the core team behind the BigCode Project and has co-authored The Stack dataset and StarCoder models for code generation. Loubna holds Mathematics & Deep Learning Master's Degrees from Ecole des Mines de Nancy and ENS Paris Saclay.

Рекомендации по теме

Комментарии

Scrape first, filter later, opt out last is the precise opposite of "open & responsible research". It is mass copyright violation as a matter of first principle.

vocesanticae

Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case

Stanford CS25: V4 I Behind the Scenes of LLM Pre-training: StarCoder Use Case

Stanford CS25: V4 I Overview of Transformers

Stanford CS25: V4 I Hyung Won Chung of OpenAI

Stanford CS25: V4 I Jason Wei & Hyung Won Chung of OpenAI

Stanford CS25: V4 I Aligning Open Language Models

Stanford CS25: V4 I Demystifying Mixtral of Experts

Stanford CS25: V4 I From Large Language Models to Large Multimodal Models

Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Stanford CS25: V3 I Retrieval Augmented Language Models

stanford cs25 v4 i hyung won chung of openai

Stanford CS25: V3 I How I Learned to Stop Worrying and Love the Transformer

Stanford CS25: V3 I Beyond LLMs: Agents, Emergent Abilities, Intermediate-Guided Reasoning, BabyLM

Which jobs will AI replace first? #openai #samaltman #ai

[VIET] Stanford CS25: V4 I Overview of Transformers - Part 1 (Phiên bản lồng tiếng)

Stanford CS25: V1 I Transformers United: DL Models that have revolutionized NLP, CV, RL

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

Stanford CS25: V1 I Self Attention and Non-parametric transformers (NPTs)

The Possibilities of AI [Entire Talk] - Sam Altman (OpenAI)

Segredos da IA: Insights da OpenAI na Stanford CS25 #IA #STANFORD #XMACNA #shorts

How I'd learn ML in 2025 (if I could start over)

Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

clear voice CS25 Transformers United 2023 Introduction to Transformers w Andrej Karpathy

Stanford CS25: V1 I Transformers in Vision: Tackling problems in Computer Vision

Andrew Ng: Advice on Getting Started in Deep Learning | AI Podcast Clips