Understanding the Llama 3 Tokenizer | Llama for Developers

Показать описание

Aston Zhang, research scientist working on Llama at Meta discusses the new tokenizer in Meta Llama 3. He discusses the improvements made to the tokenizer in Meta's latest Llama 3 models. The new tokenizer uses Tiktoken instead of SentencePiece and has a larger vocabulary size of 128k, resulting in better performance on coding, reasoning, and more. The increased vocabulary size allows for more specific and nuanced encoding of inputs, while the higher compression ratio reduces the number of tokens required to represent an input. Additionally, the use of Group Query Attention helps balance out the increased memory and compute needs, resulting in a model that can process larger batches without increasing latency.

# Timestamps
00:00 Introduction
00:25 What's new in the Llama 3 tokenizer?
01:58 Vocabulary size and compression ratio
13:01 Performance, efficiency and improving costs
17:46 Recap and resources

# Additional Resources

#llama3 #llm #opensource
- - -

# Follow us on social media

Meta AI focuses on bringing the world together by advancing AI, powering meaningful and safe experiences, and conducting open research.

Рекомендации по теме

Комментарии

TLDR

- from llama 2 to llama3 they switched from sentencepiece to tiktoken
- vocab size 32k -> 128k
- ~15% fewer tokens for english, ~50% fewer for "some other languages"

loabrasumente

Aston please explain the attention mechanism, Actually I am stuck in the chapter "Attention and transformer" of your book d2l

parvesh-rana

Hey guys thanks for this great video. In your opinion, using, Python, what is the best approach to counting the tokens of my prompt to llama3.2?

eziola

My instinct is tokenization is underestimate in importance. Usually the hard and boring fundamentals are where the magic happens. Across all fields.

therobotocracy

could someone from the meta LLaMa 3 team please explain how to train my very own tiktoken tokenizer like you guys did for llama 3. there is no opensource steps to recreate this

stephennfernandes

Is the Llama 3 paper out yet? He mentions it @ 24:02

kaushilkundalia

You develop Tamil language for Tamil users

Sashvinth

so this guy is payied to use open sourced tiktoken

HamedSoheili-qr

i don't think this format works unless the intent is to discuss at a high level.

wryltxw

Classic example of a provably smart guy not being able to express his thoughts... 5 minutes of pain is all I managed to force myself to watch. A shame.

inteist

Understanding the Llama 3 Tokenizer | Llama for Developers

Understanding the Llama 3 Tokenizer | Llama for Developers

Let's build the GPT Tokenizer

Meta AI Llama 3 Explained (in 3 Minutes!)

How can changes to the Llama 3 tokenizer help drive down inference costs? #llama3

Tokenizers Overview

'okay, but I want Llama 3 for my specific use case' - Here's how

Build Anything with Llama 3 Agents, Here’s How

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

How Did Llama-3 Beat Models x200 Its Size?

Llama 3 - 8B & 70B Deep Dive

Building a new tokenizer

How Large Language Models Work

Parameters vs Tokens: What Makes a Generative AI Model Stronger? 💪

How to Run Llama 3 Locally? 🦙

LLaMA2 Tokenizer and Prompt Tricks

Breaking Down Meta's Billion Dollar LLM Blueprint [Llama-3.1 Full Breakdown]

How Tokenization Work in LLM - Complete Tutorial

LLAMA-3 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

BERT vs GPT

EASILY Train Llama 3 and Upload to Ollama.com (Must Know)

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token

WizardLM2 function call - Using Pydantic & Llama 3 Tokenizer #ai #llama3 #metaai #review #llm

Transformers, explained: Understand the model behind GPT, BERT, and T5