Improving LLM accuracy with Monte Carlo Tree Search

preview_player
Показать описание

VIDEO RESOURCES:

TIMESTAMPS:
0:00 Large Language Models Make Things Up!
0:42 Boosting Llama 3 8B performance to GPT-4 (only on certain benchmarks!)
3:13 How prompting affects accuracy
4:58 How Monte Carlo tree search works
7:49 Balancing exploitation with exploration
10:18 Jupyter Notebook Code
26:59 Testing Monte Carlo Tree Search on a simple example
29:16 Boosting Performance on Maths problems
31:48 Limitations on Monte Carlo Performance Boosts
32:58 Resources
Рекомендации по теме
Комментарии
Автор

Beautiful. Just like us. the more we fail, the better. Explore vs. Exploit. I love humanity. ❤

KopikoArepo
Автор

The Monte Carlo method surely approach the better result of the probabilistic model, but costs are really high. No matter what, good job for the clear explanation👍

tongagi
Автор

Fascinating! This morning, I posted on X about MCTS and this paper, and later, YouTube showed me your video. Such a great coincidence. I found the coefficient C in the UCT formula for balancing exploration and exploitation really interesting. I experimented with different settings and even made it random like temperature. The results are intriguing—might share the repo and a video soon.

I wonder what would happen if we built a neural network like MoE but with this MCTS structure and trained it. Would it train while searching and reasoning? Could it generate a model far better at reasoning? What do you think? Anyway, kudos to you—you're right on track and well updated as usual.

unclecode
Автор

Me with Tarot Cards till I get the answer I like. But seriously, MTS seems like a formal way to structure an extended interaction with a user. MTS feels a lot like what I do when I use Google AI Search as I barrage it with a cloud of different prompts when searching for a particular piece of knowledge for which I may not know the conventional terminology. in other words, the internediate answers provide information for prompt refinement. For example, I once started with “nitrogen in soil” and ended up with “soil nitrification”, which was the prompt that gave the knowledge I sought. Thanks for the vid!

KarlLew
Автор

Fascinating paper and excellent demonstration. Llama3-8B can answer some difficult math and coding problems using this that the top open-source models fail to do with a direct answer. The first thing I noticed was it games the rating response by pretending to run unit tests that pass. Adding to the critique prompt it was a written test and the answerer had no access to a computer to run tests fixed that and it has started solving some easy ARC-AGI tasks I couldn't get proprietary models to solve.

_paixi
Автор

You're the exact person I was hoping would make a video in this after I read that paper. Could this technique be enhanced even further with retrieval?

nashvillebrandon
Автор

great job, it's literally manual reinforcement learning!🤣🤣🤣

waneyvin
Автор

Thanks man. Great intro to MCTS. What I am curious about is why we do a random selection among the first generation and not have it rate that one and select the best answer from the root.

miladmirmoghtadaei
Автор

Question: given enough time, can MCTS (Monte Carlo Tree Search) find the best solution?

The problem about MCTS is that it chooses the child node with the highest probability of having a solution.
As long as those probabilities don't change, MCTS will choose the same node, no matter how many iterations you perfom. That means some leaves (terminal nodes) are unreachable.
If the best solution happens to be in an unreachable leaf, MCTS will never find it.

hcm
Автор

Holly molly!! I was just reading today about how MCTS can be used to improve LLMs. Are you reading minds now?

kunalsuri
Автор

Thanks a lot man 😎👏❤️ we would like you to devote a future video to talk about the CLIN paper to build a self improving language agents

free_thinker
Автор

A potental improvement is to have a dynamic child node amount based on the rating also the weight defining exploration vs exploitation could be dynamically set too maybe even by the llm filling in more than just a score

also the backprop of the ratings is cool but there could be some decay so that nodes wayyy up on the tree dont get super locked in if you're doing a tree that is 8 layers deep

MasamuneX
Автор

subscribed, very interesting. good work on explaining it :)

tonyppe
Автор

Hey, are you still doing things on patent-me? No content on the page (?)

tullyfisher
Автор

Isn't this the Q-Star algorithm we've been dreaming of?

ПетрФомин-щж
Автор

Did you explain the PromptAgent paper in just half an hour?

saurabhkram
Автор

Thanks! How could we improve this with a compiler, search or some form of symbolic reasoning?

r.s.e.
Автор

thank you, can you make an explanation about gguf quantization and how to convert custom multimodal to gguf

aissabakhil
Автор

yeahhhh perfect explanation, thank you bro

andrew_moffat
Автор

How does it differ from the tree of thoughts prompting?

Saurabh