why llama-3-8B is 8 billion parameters instead of 7?

Показать описание

llama-3 has ditched it's tokenizer and has instead opted to use the same tokenizer as gpt-4 (tiktoken created by openai), it's even using the same first 100K token vocabulary.

In this video chris walks through why Meta has switched tokenizer and the implications on the model sizes, embeddings layer and multi-lingual tokenization.

he also runs his tokenizer benchmark and show's how it's more efficient in languages such as japanese

repos
------

Рекомендации по теме

Комментарии

Excellent demonstration Chris, thanks for sharing!

charbakcg

Great stuff.. no nonsense presentation style, clear and technical, as it should be 😅.. question: is there a reason why it’s not better to have common English syllables in the vocabulary? I understand “lov” being there, but I can’t imagine that “el” is a very useful token as part of “Lovelace”.. intuitively, I would think that is should simply be tokenized as “love” and “lace”

goodtothinkwith

Im super excited to see the `llama.cpp`, `llama2.c`, etc. category be implemented for llama3!

aaravsethi

ok, that is all very concrete! Awesome. Thanks for this. This seems like a lot of quick wins that are easy to discover, or is that because hindsight by you explaining it so clearly? Anyway, its all a bit new to me. Perhaps, lets say Norway, would be wise to run this with their own tokeniser? Or is that to simplistic thinking?

rluijk

What are you thought on including space in the tokenizer? I tried it once and the LLM was optimising to predict spaces as those easy wins for the LLM, but I like the way tiktoken has done to keep the space but not space as a token on it own....

leeme

Why is there some pytorch? Does finetuned or merged versions need it?

rogerc

why llama-3-8B is 8 billion parameters instead of 7?

why llama-3-8B is 8 billion parameters instead of 7?

LLaMA 3 Tested!! Yes, It’s REALLY That GREAT

How Did Llama-3 Beat Models x200 Its Size?

LLaMA 3 UNCENSORED 🥸 It Answers ANY Question

Llama 3 - 8B & 70B Deep Dive

Llama 8b Tested - A Huge Step Backwards 📉

How to Download Llama 3 Models (8 Easy Ways to access Llama-3)!!!!

Llama 3 8B: BIG Step for Local AI Agents! - Full Tutorial (Build Your Own Tools)

Community Paper Reading: The Llama 3 Herd of Models

Meta AI & Zuck are LEGENDARY for This! Llama 3 will 𝙖𝙘𝙩𝙪𝙖𝙡𝙡𝙮 'Shock the Industry'...

37% Better Output with 15 Lines of Code - Llama 3 8B (Ollama) & 70B (Groq)

Meta Llama 3.1 is Game Over for GPT 4o ❓

Meta claims Llama 3 is the most advanced open source AI yet l TechCrunch Minute

LLAMA 3 : Explained and Summarised Under 8 Minutes (Compared to Llama 2, Meta AI)

This Llama 3 is powerful and uncensored, let’s run it

Build Anything with Llama 3 Agents, Here’s How

How powerful is Llama 3? #ai #mlops ##llm #llama3 #meta #coding #datascience #technology #ubiops

How to Run LLAMA 3 on your PC or Raspberry Pi 5

Metas LLAMA 3 Just STUNNED Everyone! (Open Source GPT-4)

Llama 405b BEAST already exploited | Here’s how

LLaMA 405b is here! Open-source is now FRONTIER!

Easily Do Function Calling with Llama 3 8B Model Locally

'okay, but I want Llama 3 for my specific use case' - Here's how

Llama3: Comparing 8B vs 70B Parameter Models - Which One is Right for You?