Test Time Compute, Part 1: Sampling and Chain of Thought

Показать описание

OTHER TRELIS LINKS:

VIDEO LINKS:

TIMESTAMPS:
0:00 OpenAI o1 type techniques for scaling test time compute
1:52 Video Overview (temperature, chain of thought)
2:17 Training compute versus test time compute
6:28 Why spend more compute on test time / inference?
10:50 Using verifiers to select the best answers
12:00 Exploring and critiquing/verifying answers during inference
15:02 Understanding Temperature for sampling
19:41 Should you set temperature to zero?
22:08 Beam search
23:30 Problems with setting a non-zero temperature
24:31 Using top p, top k, min p, and best of
27:36 Recap on choosing temperature for sampling
28:20 How to implement chain of thought
29:40 Setup for notebook run-through on gsm8k and hotpot qa
31:20 Using sampling and chain of thought on hotpotqa and gsm8k
31:47 Running vllm in a Jupyter notebook (allows for batching)
36:15 Scoring / Grading with OpenAI gpt4o-mini using regex enforcement
39:39 Multi-threading the scoring / grading for speed
40:30 Running the dataset multiple times to get the mean and mean absolute deviation of correct answers
41:29 Controlling sampling parameters (min p, top p, top k, beam search, temperature)
43:46 Running temperature / sampling ablations WITHOUT chain of thought
46:48 Chain of Thought Setup
49:02 Running ablations WITH chain of thought
50:44 GSM8K Results Charts
52:09 Hotpot QA Results Charts
53:09 Recommendations on sampling, temperature and chain of thought
55:17 Video resources

Рекомендации по теме

Комментарии

I think this is the best channel for people who wants to use LLMs.

Most of the other creators on YT are just reading Jupiter Notebooks live (which is something I can perfectly do on my own), but your channel is the only one which goes into enough level of details to be able to understand and learn.

Please never stop with these videos 🙏I know it's a niche but this is the type of content useful for businesses which goes beyond the hype.

eugeniosegala

Excellent content as usual, thanks mate!

JaredWoodruff

Would be good to see Monte Carlo tree search, scoring each reasoning step… I’m toying with the idea of a genetic algorithm variation to Monte Carlo tree search… but some way to do all this locally using ollama, and also a way to fine tune a local model to produce reasoning steps based on the discovered. Best scored reasoning steps.

nathank

Super helpful recap in more detail then you would think on a daily basis!

Would you consider doing a video about fine tuning an LLM for proper text classification? For example Llama 3.2 8b to classify documents, and return proper probabilities (so model can be analysed and improved over time).
It can be done by changing the last network layer using AutoModelForSequenceClassification, just can not find any examples for Llama 3.2 yet.

No worries if it ain't your cup of tea ;)

alchemication

Hi, i have a code completion tool, what do you think is the best configuration for the model to do a code completion? to be more fast and accurate? Very nice video! Thank you so much!

btaranto

Hey @TrelisResearch, I've a request slightly off topic. Can you share the original dockerfile for your RunPod one-click templates. I'm new to this field and I want to learn how to build a docker image for an inference engine like TensorRT-LLM. This sort of docker image building can also be used to deploy training scripts in my case. If it's not much trouble for you, can you please make a short video on the building of docker image for inference engine like TensorRT-LLM and saving it as a one-click template which can be readily deployed and expose an API endpoint.

savanthtadepalli

Test Time Compute, Part 1: Sampling and Chain of Thought

Test Time Compute, Part 1: Sampling and Chain of Thought

Test Time Compute, Part 2: Verifiers

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Ep 1. Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Characterizing Test Time Compute on Graph Structur… | Kudzo Ahegbebu | OpenAI Scholars Demo Day 2021...

How to Answer Any Question on a Test

Calculating T-Test Scores with Microsoft Excel and LAPD Data: Part 1

If You See Square Waves, Get Out of the Water!

3 Ways To Find Driving Test Dates Within 3 Days UK

Calculate One-Sample T-Test (By Hand) - Part 1

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Basic Computation Test #1 (Questions 1-8)

I Exposed The Worst Rated Theme Parks!

Testing My Own Private Cloud! It's FAST!

Don’t waste your time with the vein test to find undertone 💕⭐️🎨#coloranalysis #colouranalysis...

10 Fun Kiddy Riddles That Stump Most Adults

Calculate Rate of Climb Required on a Departure - FAA Instrument Written Test Question

COMPLETELY Overhauling This Old HP Server [Part 1]

Test your vision now: Can you find the hidden number in record time? #shorts #observationskills

“FIND HER A SPOT!” Dance Moms Audition CHAOS (Flashback MEGA-COMPILATION) | Lifetime

OPPO Find X7 Ultra Video Test

Testing Tombstones - Using Science to Find the Best Workholding Fixture - Part 1

Find the odd one out! part 3 |Test your eyes level 1

Understand IELTS Speaking in JUST 9 Minutes!