Pre-train with patches for huge compute savings

Показать описание

PATCH-LEVEL TRAINING FOR LARGE LANGUAGE MODELS

My Twitter, LinkedIn, Discord, Patreon, consultation booking page, etc:

Timestamps:
00:00 intro
00:55 background/motivation
06:29 experiments
11:31 scaling trends
13:45 comparing patch sizes
15:21 why it works
16:55 outro

Tunadorable

Рекомендации по теме

Комментарии

This made me think about how to do a transformer that learns how to do tokenization automatically (instead of using k=4).

Take a sequence of bytes / characters, and output a split / don't split for each character. This gives us a sequence of tokens based on some threshold function. This is done by the first transformer.

Where we actually split, we then feedforward / backpropagate through a second transformer to do the actual language modeling task. The loss backpropagates back all the way back to each token, and then out to the original characters. Basically, the token-level loss is backpropagated to the characters chosen based on magnitude of the predictions. E.g. If the characters are 'T', 'H', 'E', and the magnitudes are 0.1 for T, 0.1 for H, and 0.9 for E, then the total loss that is backpropagated is 0.1/1.1 for T, 0.1/1.1 for H, and 0.9/1.1 for E, since the word taken / processed was 'THE' within the second transformer

marinepower

Super cool. This makes so much sense. I wonder if it would help with character-level transformers? Because then during patch training, the characters would be grouped. You could maybe use a tiny NLP model to split into patches, e.g. by word. But then ultimately train on raw characters for the final model.

tornyu

How do they get E embedding function without pre-training on token level?

winwin-gwrn

Wew there is a lot of crypto scam bots here in the comments.

jondoe

Pre-train with patches for huge compute savings

Pre-train with patches for huge compute savings

I filmed my body EVERY DAY FOR A MONTH & this happened... #Shorts

AmazingChina: Giant Bridge Building Machine (SLJ900)

Best Pre Workout Meal for a MASSIVE PUMP #shorts

Cardio After Weight Training MISTAKE #shorts

The new Ozempic craze & its big problem

FaceLifting & SmileLine Reduction 60% in hours! @houseofbeautyindia #Overnight #Antiwrinkle #pat...

LOWER BELLY WORKOUT

Snowflake LLM: From Pre-training to Fine-Tuning

Muscle Recovery: How Long Should You Rest Between Workouts?

How to level a bumpy lawn! #lawncare #lawnleveling #grass #dadbod #howto #diy #spring #renovation

WATCH THIS BEFORE YOU BUY CREATINE!

Most OVERRATED Pre-Workout Ever?

How I Built A Huge Upper Chest

What is LLM Pre-Training?

Lando Norris’ pre-season neck training looks BRUTAL

Don't Do Biceps Curls Like This ❌

4 exercises for better/ bigger glutes 🍑

How To Grow Grass #shorts

How to Open Up Your Respiratory Tract in Seconds! Dr. Mandell

Taping the shoulder for pain relief with Spidertech I-Strips! #painmanagement #shoulderpain ￼

W1 13 Pre training large language models

How Much Stronger Do Steroids Make You

Big Calves Are HORRIBLE For Sports

Taping the shoulder for pain relief with Spidertech I-Strips! #painmanagement #shoulderpain