Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Full Paper Review)

preview_player
Показать описание
#gpt4 #ai #prompt

Tree-of-Thought improves prompting of large language models (LLMs) by generalizing the concept of Chain-of-Thought prompting and introduces a tree search across language model thoughts, including state evaluation and backtracking. Experiments on toy tasks show large improvements over both classic and Chain-of-Thought prompting.

OUTLINE:
0:00 - Introduction
1:20 - From Chain-of-Thought to Tree-of-Thought
11:10 - Formalizing the algorithm
16:00 - Game of 24 & Creative writing
18:30 - Crosswords
23:30 - Is this a general problem solver?
26:50 - Ablation studies
28:55 - Conclusion

Abstract:
Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: this https URL.

Authors: Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Рекомендации по теме
Комментарии
Автор

OUTLINE:
0:00 - Introduction
1:20 - From Chain-of-Thought to Tree-of-Thought
11:10 - Formalizing the algorithm
16:00 - Game of 24 & Creative writing
18:30 - Crosswords
23:30 - Is this a general problem solver?
26:50 - Ablation studies
28:55 - Conclusion

YannicKilcher
Автор

For multi-step agents, it is exponentially important that each "step" has as high success rate, as the compound success rate decreases very quickly with both count of steps and unit success rate: Foing from e.g. 90% to 95% is actually a lot, as it enables the chain length to go from 7 steps to 14 steps and still have a ~50% compound success rate, so enables vastly more complicated problems to be solved. Hence, it will often be very valuable to review and iterate on each sub-step to maximize the chance that it doesn't block the entire chain.

JurekOK
Автор

Yannic is the one man who's actually giving intelligent critique of new papers, instead of just throwing the paper into chatpdf and making a video.

niggawatt
Автор

A very nice addition to the new field of computational philosophy.

ixionkx
Автор

One application of this "AI-guided tree search" is in automated theorem proving. There was a reasearch project termed GPT-f, where they took the Lean proof assistant which can precisely check if a proof up to a certain point is correct and designed a plugin that constructs a proof step-by-step with backtracking using a language model (GPT-f itself) as the decision maker and it was able to prove about 60% of common geometry/algebra theorems with zero user intervention. As a type theory nerd myself I am excited to see what this branch of research brings next 🎉

jit_rs
Автор

I don’t mind at all that you didn’t cut out the “um”s. It probably saves you a heap of time that is better spent on reading papers, and it makes your videos feel more personable.

ixionkx
Автор

Yannic, thank you for this excellent video on the 'Tree of Thoughts' research paper. Your explanation was very clear and concise, making it easy for even a layman like me to understand. I appreciate your efforts in breaking down the decoding technique used in large language models and highlighting its usefulness in investigative problem-solving patterns. Keep up the great work!

dribrahimel-nahhal
Автор

When I saw this paper, I was hoping someone like you would cover it. Thanks a lot!

ilianos
Автор

Awesome I saw this and wondered if it was profound. Thanks for explaining it.

marshallmcluhan
Автор

Thank you for reviewing this! Yannic is always on top of things :)

amalzubidat
Автор

I'm pretty sure in the picture at 10:46 the authors meant to descend into the left branch first and backtrack to later descend through the solid green branch, not like Yannic explained.

clray
Автор

I like your content so much that I felt it necessary to express my gratitude in the comment section, simply pressing the like button does not cut it for me in this case.

titastotas
Автор

Awesome, I was reading it last night! Very glad you posted it right on time :)

lucastononrodrigues
Автор

Yannic, Your sunglasses are strikingly stunning.. Much thanks for keeping me informed on AI goings on.. Also thanks for being anti-boring, funny and or highlarious.. - Cheers!!

killermike
Автор

Watching Yannic try to come up with a crossword cue for "ape" was hilarious.

cutebabyseal
Автор

LLMs are N-gram Markov models, in that they output a single token, based on the last N tokens of chat history. So outputting intermediate steps helps the follow up calls to the model to organize its reasoning. Just like a human being has more chances solving an equation with a piece of paper, instead of relaying solely on their brain. In other words, some problems inherently requires N tokens of memory to be solved by a given model. Guess in the end scientists will expand the big-O space and computation complexity to LLMs. Obviously you can also ask the model to introduce different personalities, like the engineers from the relevant fields or simply different psychological models, which will explicitly reference associated knowledge during the solving the problem, and you will get a several totally different answers, and all of them could be worthy of considering.

nangld
Автор

Very informative and good voice for radio. Cheers Yannic!

mono_onamoto
Автор

This is cool. Sort of my first video I've watched about prompt engineering. The idea of creating sort of virtual neurons comes to mind. And yeah right as this was coming out, I was thinking the exact same thing, like they would replace parts of algorithms or "functions".

Rockyzach
Автор

Thank you for making it much easier to consume these papers!

florianbehrens
Автор

Sounds like a Stack-RNN may be the next step for DeepMind given the prominent mention in the recent Princeton/DeepMind paper "Neural Networks and the Chomsky Hierarchy". However, since there are no authors in common between the two papers, it may require overcoming some of the Big Org problems that have plagued Alphabet's ability to execute on its in-house talent.

jabowery