Improving LLM accuracy with Monte Carlo Tree Search

Показать описание

VIDEO RESOURCES:

TIMESTAMPS:
0:00 Large Language Models Make Things Up!
0:42 Boosting Llama 3 8B performance to GPT-4 (only on certain benchmarks!)
3:13 How prompting affects accuracy
4:58 How Monte Carlo tree search works
7:49 Balancing exploitation with exploration
10:18 Jupyter Notebook Code
26:59 Testing Monte Carlo Tree Search on a simple example
29:16 Boosting Performance on Maths problems
31:48 Limitations on Monte Carlo Performance Boosts
32:58 Resources

Рекомендации по теме

Комментарии

Beautiful. Just like us. the more we fail, the better. Explore vs. Exploit. I love humanity. ❤

KopikoArepo

The Monte Carlo method surely approach the better result of the probabilistic model, but costs are really high. No matter what, good job for the clear explanation👍

tongagi

Fascinating! This morning, I posted on X about MCTS and this paper, and later, YouTube showed me your video. Such a great coincidence. I found the coefficient C in the UCT formula for balancing exploration and exploitation really interesting. I experimented with different settings and even made it random like temperature. The results are intriguing—might share the repo and a video soon.

I wonder what would happen if we built a neural network like MoE but with this MCTS structure and trained it. Would it train while searching and reasoning? Could it generate a model far better at reasoning? What do you think? Anyway, kudos to you—you're right on track and well updated as usual.

unclecode

Me with Tarot Cards till I get the answer I like. But seriously, MTS seems like a formal way to structure an extended interaction with a user. MTS feels a lot like what I do when I use Google AI Search as I barrage it with a cloud of different prompts when searching for a particular piece of knowledge for which I may not know the conventional terminology. in other words, the internediate answers provide information for prompt refinement. For example, I once started with “nitrogen in soil” and ended up with “soil nitrification”, which was the prompt that gave the knowledge I sought. Thanks for the vid!

KarlLew

Fascinating paper and excellent demonstration. Llama3-8B can answer some difficult math and coding problems using this that the top open-source models fail to do with a direct answer. The first thing I noticed was it games the rating response by pretending to run unit tests that pass. Adding to the critique prompt it was a written test and the answerer had no access to a computer to run tests fixed that and it has started solving some easy ARC-AGI tasks I couldn't get proprietary models to solve.

_paixi

You're the exact person I was hoping would make a video in this after I read that paper. Could this technique be enhanced even further with retrieval?

nashvillebrandon

great job, it's literally manual reinforcement learning!🤣🤣🤣

waneyvin

Thanks man. Great intro to MCTS. What I am curious about is why we do a random selection among the first generation and not have it rate that one and select the best answer from the root.

miladmirmoghtadaei

Question: given enough time, can MCTS (Monte Carlo Tree Search) find the best solution?

The problem about MCTS is that it chooses the child node with the highest probability of having a solution.
As long as those probabilities don't change, MCTS will choose the same node, no matter how many iterations you perfom. That means some leaves (terminal nodes) are unreachable.
If the best solution happens to be in an unreachable leaf, MCTS will never find it.

hcm

Holly molly!! I was just reading today about how MCTS can be used to improve LLMs. Are you reading minds now?

kunalsuri

Thanks a lot man 😎👏❤️ we would like you to devote a future video to talk about the CLIN paper to build a self improving language agents

free_thinker

A potental improvement is to have a dynamic child node amount based on the rating also the weight defining exploration vs exploitation could be dynamically set too maybe even by the llm filling in more than just a score

also the backprop of the ratings is cool but there could be some decay so that nodes wayyy up on the tree dont get super locked in if you're doing a tree that is 8 layers deep

MasamuneX

subscribed, very interesting. good work on explaining it :)

tonyppe

Hey, are you still doing things on patent-me? No content on the page (?)

tullyfisher

Isn't this the Q-Star algorithm we've been dreaming of?

ПетрФомин-щж

Did you explain the PromptAgent paper in just half an hour?

saurabhkram

Thanks! How could we improve this with a compiler, search or some form of symbolic reasoning?

r.s.e.

thank you, can you make an explanation about gguf quantization and how to convert custom multimodal to gguf

aissabakhil

yeahhhh perfect explanation, thank you bro

andrew_moffat

How does it differ from the tree of thoughts prompting?

Saurabh

Improving LLM accuracy with Monte Carlo Tree Search

Improving LLM accuracy with Monte Carlo Tree Search

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Use This Way Of Training Machine Learning Models For Efficiency

Introducing Code Assist: Build Applications Faster with AI by Marc Gueury, John Karasoulos

Large Language Models and the Future of Programming by Peter Norvig

Planning In Natural Language Improves LLM Search For Code Generation

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

A machine learning approach to stock trading | Richard Craib and Lex Fridman

Improve AGENTIC AI (Princeton)

Google Notebook Is The SMARTEST note-taking app RIGHT NOW!

Girls Hostel Madness😂❤️ #shorts #short #girls #hostellife

The Future of LLMs - SuperAnnotate Webinars (November 30, 2023)

Q* explained: Complex Multi-Step AI Reasoning

You Need Better Knowledge Graphs for Your Graph RAG

[QA] Planning In Natural Language Improves LLM Search For Code Generation

[LAFI'24] Effective Sequential Monte Carlo for Language Model Probabilistic Programs

Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Full Paper Review)

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

A Neat Trick to Make AI Cheaper & Faster - Creating a Prompt Compressor

AI Research Radar | Alpha LLM: Self-Improvement of LLMs | Automated Social Science | ACTIVE RAG

Planning with LLMs for Code Generation | ICLR 2023

LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Opti

Tutorial | LLMs in 5 Formulas (Standard Format)

AI Won't Be AGI, Until It Can At Least Do This (plus 6 key ways LLMs are being upgraded)