filmov
tv
Test Time Compute, Part 1: Sampling and Chain of Thought
Показать описание
OTHER TRELIS LINKS:
VIDEO LINKS:
TIMESTAMPS:
0:00 OpenAI o1 type techniques for scaling test time compute
1:52 Video Overview (temperature, chain of thought)
2:17 Training compute versus test time compute
6:28 Why spend more compute on test time / inference?
10:50 Using verifiers to select the best answers
12:00 Exploring and critiquing/verifying answers during inference
15:02 Understanding Temperature for sampling
19:41 Should you set temperature to zero?
22:08 Beam search
23:30 Problems with setting a non-zero temperature
24:31 Using top p, top k, min p, and best of
27:36 Recap on choosing temperature for sampling
28:20 How to implement chain of thought
29:40 Setup for notebook run-through on gsm8k and hotpot qa
31:20 Using sampling and chain of thought on hotpotqa and gsm8k
31:47 Running vllm in a Jupyter notebook (allows for batching)
36:15 Scoring / Grading with OpenAI gpt4o-mini using regex enforcement
39:39 Multi-threading the scoring / grading for speed
40:30 Running the dataset multiple times to get the mean and mean absolute deviation of correct answers
41:29 Controlling sampling parameters (min p, top p, top k, beam search, temperature)
43:46 Running temperature / sampling ablations WITHOUT chain of thought
46:48 Chain of Thought Setup
49:02 Running ablations WITH chain of thought
50:44 GSM8K Results Charts
52:09 Hotpot QA Results Charts
53:09 Recommendations on sampling, temperature and chain of thought
55:17 Video resources
Комментарии