Llama 3 - 8B & 70B Deep Dive

preview_player
Показать описание
Meta AI has released Llama-3 in 2 sizes an *b and 70B. In this video I go through the various stats, benchmarks and info and show you how you can get the model running. As always the Colab is in the description.

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro
00:35 Meta AI Blog: Llama 3
01:47 Llama 3 Model Card: 8B and 70B
04:25 Intended Use Cases
05:06 Cloud Providers available for Llama 3
05:32 Llama 3 Benchmarks
08:59 Scaling up Pre-training
09:58 Downloading Llama 3 on Hugging Face
10:21 License Conditions
12:44 Llama 3 405B Model: Sneak Peek
14:30 Code Time: Ollama
15:44 Llama 3 on Hugging Chat
16:00 Different Options on Deploying Llama 3
16:30 Llama 3 on Together AI
16:56 Llama 3 on Colab
Рекомендации по теме
Комментарии
Автор

I appreciate the factual, no-hype tone. I liked seeing your prompts as a sort of proof of research. Subscribed to bring up the quality of my feed around AI.

seespacelabs
Автор

Thanks for the excellent introduction. Can't wait to give it a drive...

walterpark
Автор

I noticed that when I asked the model to create a story it wrote a chapter for the story and then after each message it asked “Would you like me to continue with the story?” And just with simple confirmation I could continue. And it seemed to work brilliantly and only after hitting the token limit did the story of course lose quality (forgetting characters etc..). I didn’t do any special prompt so this seemed like a trained thing and it worked awesome!

Normally when you want to keep going writing stories many other models need to be reminded or have to copy-paste the previous story for them to figure out you want to continue the process.

venim
Автор

@samwitteveenai, I noticed you're using a custom runtime. Do you have a video tutorial on customizing a capable GPU for running training on Llama without using the quantized version? I configured a custom T4 on GCP to use in Colab, but it seems to be limited to 15GB of RAM for the GPU.

melchhepta
Автор

Do we have any idea what non-english languages are supported for llama3?

Recluse
Автор

Hi Sam, thanks for this one. Can you share what type of specs be needed for a computer that needs to run Llama 70b locally with a decent performance for multiple(~5 users) concurrently.

nqaiser
Автор

Be nice to see how this behaves with local data on local machines. For stuff we need to do with our specific stuff.

morespinach
Автор

I asked the model that if it could work completely offline and it responded though it can it would lose touch with the training data and shut down. Did anyone else see this?

clvnegu
Автор

Thank you Sam, as always you were amazingly informative and interesting.
I already tried 8b-instruct-q5_K_M directly from ollama, the chat session is terrible and the model spits out training data like a train of words.
will try the the default one (latest) to see if any good comes out.

stawils
Автор

70B variant fits in an RTX A6000 with bitsnbytes quantization. Yet to try HF Chat UI but works well with TGI.

iainattwater
Автор

This is not really a deep dive sadly. Just more info. Was hoping to see some actual code and performance in terms of accuracy of outcomes.

morespinach
Автор

The context window is really low compared to other models.

It should be fine for a lot of tasks but still I’m surprised there was no improvement in that regard.

theworddoner
Автор

so far when using groq api with llama3 it seems to use json tool functions easier and understand their assignments and roles better, which then produces better quality code/responses/tool usage.

drlordbasil
Автор

i have no idea what is going on, i fell off on AI race.
i can't understand benchmark what does shot 5 mean?

NormTurtle
Автор

Does 15 trillion tokens take into account MULTIPLE EPOCHS? There is confusion about it. The old Pile, for example, is only 750 billion tokens.

pensiveintrovert
Автор

Frente al millón de tokens de Gemini 1.5, están muy lejos, me imagino que debe haber mucho uso de memoria por esos modelos, pero el hecho de que sea open source, es un gran regalo.

adriintoborf
Автор

A bit let down that you immediately go to meta's instruct fine-tune and never compare base model capabilities. This 8b rivals Mixtral 8x7b!! But moreover, developers are cheating themselves, only knowing how to use chatbots, and if nobody learns the value, then we seriously may see companies only release chatbot models in the future!! :(

DaeOh
Автор

well, ask a question in Estonian slang ... an you'll see how "large" those language models are ..
indo-europan vs uralic is first thing ting that throws LLM out of kilter .. other different language structures too i think .. but i'm not familiar with other language forks to judge ...

matikaevur
Автор

chinchilla optimal means for a given amount of tokens there is an optimal amount of parameters. this does NOT mean that vice versa there is an optimal token amount for x parameters. in fact there is no limit, no amount of tokens is the maximum. this is a very deep misunderstanding present in even the research community and kind of annoying if you ask me

JazevoAudiosurf
Автор

When is the not too distant future? Next Sunday A.D.?

erikjohnson