Test Time Compute, Part 1: Sampling and Chain of Thought

preview_player
Показать описание

OTHER TRELIS LINKS:

VIDEO LINKS:

TIMESTAMPS:
0:00 OpenAI o1 type techniques for scaling test time compute
1:52 Video Overview (temperature, chain of thought)
2:17 Training compute versus test time compute
6:28 Why spend more compute on test time / inference?
10:50 Using verifiers to select the best answers
12:00 Exploring and critiquing/verifying answers during inference
15:02 Understanding Temperature for sampling
19:41 Should you set temperature to zero?
22:08 Beam search
23:30 Problems with setting a non-zero temperature
24:31 Using top p, top k, min p, and best of
27:36 Recap on choosing temperature for sampling
28:20 How to implement chain of thought
29:40 Setup for notebook run-through on gsm8k and hotpot qa
31:20 Using sampling and chain of thought on hotpotqa and gsm8k
31:47 Running vllm in a Jupyter notebook (allows for batching)
36:15 Scoring / Grading with OpenAI gpt4o-mini using regex enforcement
39:39 Multi-threading the scoring / grading for speed
40:30 Running the dataset multiple times to get the mean and mean absolute deviation of correct answers
41:29 Controlling sampling parameters (min p, top p, top k, beam search, temperature)
43:46 Running temperature / sampling ablations WITHOUT chain of thought
46:48 Chain of Thought Setup
49:02 Running ablations WITH chain of thought
50:44 GSM8K Results Charts
52:09 Hotpot QA Results Charts
53:09 Recommendations on sampling, temperature and chain of thought
55:17 Video resources
Рекомендации по теме
Комментарии
Автор

I think this is the best channel for people who wants to use LLMs.

Most of the other creators on YT are just reading Jupiter Notebooks live (which is something I can perfectly do on my own), but your channel is the only one which goes into enough level of details to be able to understand and learn.

Please never stop with these videos 🙏I know it's a niche but this is the type of content useful for businesses which goes beyond the hype.

eugeniosegala
Автор

Excellent content as usual, thanks mate!

JaredWoodruff
Автор

Would be good to see Monte Carlo tree search, scoring each reasoning step… I’m toying with the idea of a genetic algorithm variation to Monte Carlo tree search… but some way to do all this locally using ollama, and also a way to fine tune a local model to produce reasoning steps based on the discovered. Best scored reasoning steps.

nathank
Автор

Super helpful recap in more detail then you would think on a daily basis!

Would you consider doing a video about fine tuning an LLM for proper text classification? For example Llama 3.2 8b to classify documents, and return proper probabilities (so model can be analysed and improved over time).
It can be done by changing the last network layer using AutoModelForSequenceClassification, just can not find any examples for Llama 3.2 yet.

No worries if it ain't your cup of tea ;)

alchemication
Автор

Hi, i have a code completion tool, what do you think is the best configuration for the model to do a code completion? to be more fast and accurate? Very nice video! Thank you so much!

btaranto
Автор

Hey @TrelisResearch, I've a request slightly off topic. Can you share the original dockerfile for your RunPod one-click templates. I'm new to this field and I want to learn how to build a docker image for an inference engine like TensorRT-LLM. This sort of docker image building can also be used to deploy training scripts in my case. If it's not much trouble for you, can you please make a short video on the building of docker image for inference engine like TensorRT-LLM and saving it as a one-click template which can be readily deployed and expose an API endpoint.

savanthtadepalli