Open Reasoning vs OpenAI

Показать описание

In this video, I look at the new open source reasoning models that have come out from a number of different companies and how they compare against the OpenAI alternatives. This helps us see just how far behind open source and open weights models are in comparison to the proprietary models from OpenAI.

For more tutorials on using LLMs and building agents, check out my Patreon

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro OpenAI o1 December Release
00:23 New Reasoning Models
00:52 Learning to Reason with LLMs Blog
01:58 Standard LLM
02:43 Reasoning LLM
03:36 Let's Verify Step by Step Paper
03:58 Chain-of-Thought Prompting Elicits Reasoning in LLMs Paper
06:26 DeepSeek-R1-Light-Preview
06:36 DeepSeek Benchmarks
09:07 DeepSeek Chat Interface Demo
16:04 Qwen-QwQ
16:42 Qwen-QwQ Benchmarks
17:47 Qwen-QwQ Chat Interface Demo
22:10 Marco-o1
22:15 Marco-o1 Paper

Рекомендации по теме

Комментарии

Devs were very clear that they shared QwQ with us so we see the progress but its just the basic mechanic of whats to come.

UCsktlulEBEebvBBOuDQ

It is very disappointing that OpenAI doesn’t even say in a paper exactly what they are doing in o1. I am sure it is a variety of techniques, some of which are being deployed in these open models. They no longer give any credence to their argument that this is because of safety. In fact, all the efforts they have put into preventing “jailbreaking” is done to avoid seeing the raw tokens (that you pay for, btw) because it would give an idea of what they are actually doing. I’m sure there are some interesting ideas there, but this idea of siloing science for competitive reasons is so far from where (they at least said) they came from, it is pretty repugnant.

toadlguy

Great video, the broken loop reminded me of the “There are 4 lights” Picard meme, which then made me realize the episode that is from is called “Chain of Command” 😂

DarrenReidAu

Interesting that the QwQ and R1 models use similar expressions in their thought processes, like "Wait a minute, letters can be tricky, especially if there are repeating letters". I wonder why?
See 11:57 and 18:35

tornyu

Thanks for the video. Very timely. One thing I find very interesting about the exposed chains of thought is that they enable us to see where the reasoning might have gone wrong. Over on AI Explained, Philip has developed a set of common-sense reasoning problems that humans do much better on than any current AI models. When I tried a few of his publicly available prompts with DeepSeek, the model did not get the “right” answer, but I could see that it had decent reasons for coming up with a “wrong”answer. The exposed reasoning thus helped to reveal ambiguities and other flaws in the reasoning problems themselves.

I imagine that careful examination of those chains of thought, both by humans and by AI, will also be a very useful way to improve the reasoning ability of these models.

TomGally

What local front end are you using with Qwen coder at 16:25 ?

derekw

Question for you. when you're doing your strawberry test are you using strings or string literals. a "strrawberry" might be interpreted as something that the AI should spell check and use a dictionary for where a might be something that it takes as a literal and attempt the task differently?

rascalwind

Sam, I can’t help myself, but it seems to work better if I dynamically prompt exactly, what I need. The first iteration determines the “reply mode/format” and the second iteration brings the reply. The agentic “flow” is way cheaper as well.

MeinDeutschkurs

Has anyone else suspected the whole 'strawberry' thing comes from the shape of the monte carlo tree graph? It would be painfully on the nose if this were true... Maybe red for filtering by logical-nots/falsey values that consists of the initial nodes, then green for the truthy final leafs. A lot of our own reasoning is first stating what x definitely is-not, then we pick from likely candidates that remain - if we really don't know something.

Charles-Darwin

11:54 How many r's in "strrawberry" (typos intentional): "thought for 9 seconds"... This is progress in AI :) I so wish for tokenization to go away, then at least his would be a nonissue.

pallharaldsson

What are your thoughts why anthropic has not released a similar chain of thought model or architecture?

grabani

We just want to know if qwen still looping
Reminds Asimovs story of SPD-13

RaitisPetrovs-nbkz

My question is : I have downloaded the open source models, but without their secret prompt, the accuracy is no comparison to openAI models, even not as good as a regular Llama 3.1. 70b instruct model. Can any one tell me where is the secret prompt? Using deepseek or qwq website is no open source at all. No one knows what model is running in the backend

menglilingsha

The speed of open source development is promising, but the traditional accuracy benchmarks hide the importance of speed which is more critical for these inference-bound models.

Nice video highlighting the current ecosystem.

lucasjans

Need an agent to assign inference time per query.

jtjames

Cool comparison! However, there's no way I'm paying for overseas reasoning models when homegrown Open AI's superb o1 exists. Which I do pay for! And it's impressive even in its pre-release versions (mini & preview).

bokuboke

do we now get impressed when the model count R's in stawberrrry?

hqcart

The progress on these models is insane, people have no clue what's coming.

NowayJose

You decide to break up with your AI waifu.

AI: QwQ what's this???

undefined

Great content, as always! A bit off-topic, but I wanted to ask: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). Could you explain how to move them to Binance?

marshall-bi

Open Reasoning vs OpenAI

Open Reasoning vs OpenAI

Explaining OpenAI's o1 Reasoning Models

OpenAI Introduces AI Models With Reasoning Abilities

Reasoning with OpenAI o1

Ten Tests: GPT4o vs OpenAI 01 Reasoning with Logic Puzzles

OpenAI’s new “deep-thinking” o1 model crushes coding benchmarks

This New AI Model Is Genius - DESTROYS OpenAI o1 in REASONING

Is OpenAI o1 Actually Better Than ChatGPT-4o? OpenAI's Newest Flagship Model and Its Capabiliti...

OpenAI's Reinforcement Fine-Tuning Research Program. 12 Days, Day 2

Is o1-preview reasoning?

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

OpenAI's o1 Series - Scroll Finder : A New Era of Reasoning in AI #openAIo1 #openAI #ai

The OpenAI Team Finally Reveals The BEST OpenAI o1 Use Cases

OpenAI's New Reasoning Model, o1 Strawberry: Is This AGI? Full Breakdown

OpenAI o1 and o1 pro mode in ChatGPT — 12 Days of OpenAI: Day 1

Sam Altman's new $200 ChatGPT has a big Elon problem...

Coding with OpenAI o1

OpenAI o1: The Next Leap in AI Reasoning

New ChatGPT Pro and Full o1 Model is Finally Here

OpenAI's New o1 Model Beats Most Coders, PhD students, Math Olympiad

Deepseek-r1 vs OpenAI-o1 who is the best reasoning model?

OpenAI Releases ChatGPT Pro, New o1 Model and More!

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

ChatGPT o1 is INSANE: See the Full Demo! 🍓🚀