OpenAI O1 models probably trained gpt-4o and turbo in chain of thought

preview_player
Показать описание
did openai train gpt-4o with strawberry using reinforcement learning on Chain of Thought.? openai's new orion o1-preview models has made a step change in logic and reasoning from older models. however many are claiming it's easily replicated just by using chain of thought, but for this to work the models have to be good at chain of thought in the first place. in this video, chris looks under the hood at the generated chain of the thought for the orion o1 models, and compares it with gpt-4o, claude 3.5 sonnet, and llama 3's cot. at the end of this video you'll have a better idea of how this works. he does this using games such as sudoku and tic-tac-toe.
Рекомендации по теме
Комментарии
Автор

This is actually a really interesting vid, subscribed!

spoony
Автор

Nicely done. Very clearly described. Thank you.

calebweintraub
Автор

Bro.. gpt4 was already trained on CoT.. here on o1 U looking at a more complex prompt strategy with multiple recalls or smth

TheTruthOfAI
Автор

why would we expect a consistent answer to the time it takes for a human endeavour which would have a variety of human actions which would have a range of expected durations? There isn't a "correct" answer. It'd be more like a normal distribution with a guess being a tice doss ... sorry a dice toss for where it lands in that normal distribution. I would be suspicious if the answer was always the same to a question with a lot of imprecision.
Ask me three times how long it'd take me to go into town and shop for half a dozen items .. you'll get three different answers.

mijmijrm