Open Reasoning vs OpenAI

preview_player
Показать описание
In this video, I look at the new open source reasoning models that have come out from a number of different companies and how they compare against the OpenAI alternatives. This helps us see just how far behind open source and open weights models are in comparison to the proprietary models from OpenAI.

For more tutorials on using LLMs and building agents, check out my Patreon

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro OpenAI o1 December Release
00:23 New Reasoning Models
00:52 Learning to Reason with LLMs Blog
01:58 Standard LLM
02:43 Reasoning LLM
03:36 Let's Verify Step by Step Paper
03:58 Chain-of-Thought Prompting Elicits Reasoning in LLMs Paper
06:26 DeepSeek-R1-Light-Preview
06:36 DeepSeek Benchmarks
09:07 DeepSeek Chat Interface Demo
16:04 Qwen-QwQ
16:42 Qwen-QwQ Benchmarks
17:47 Qwen-QwQ Chat Interface Demo
22:10 Marco-o1
22:15 Marco-o1 Paper
Рекомендации по теме
Комментарии
Автор

Devs were very clear that they shared QwQ with us so we see the progress but its just the basic mechanic of whats to come.

UCsktlulEBEebvBBOuDQ
Автор

It is very disappointing that OpenAI doesn’t even say in a paper exactly what they are doing in o1. I am sure it is a variety of techniques, some of which are being deployed in these open models. They no longer give any credence to their argument that this is because of safety. In fact, all the efforts they have put into preventing “jailbreaking” is done to avoid seeing the raw tokens (that you pay for, btw) because it would give an idea of what they are actually doing. I’m sure there are some interesting ideas there, but this idea of siloing science for competitive reasons is so far from where (they at least said) they came from, it is pretty repugnant.

toadlguy
Автор

Great video, the broken loop reminded me of the “There are 4 lights” Picard meme, which then made me realize the episode that is from is called “Chain of Command” 😂

DarrenReidAu
Автор

Interesting that the QwQ and R1 models use similar expressions in their thought processes, like "Wait a minute, letters can be tricky, especially if there are repeating letters". I wonder why?
See 11:57 and 18:35

tornyu
Автор

Thanks for the video. Very timely. One thing I find very interesting about the exposed chains of thought is that they enable us to see where the reasoning might have gone wrong. Over on AI Explained, Philip has developed a set of common-sense reasoning problems that humans do much better on than any current AI models. When I tried a few of his publicly available prompts with DeepSeek, the model did not get the “right” answer, but I could see that it had decent reasons for coming up with a “wrong”answer. The exposed reasoning thus helped to reveal ambiguities and other flaws in the reasoning problems themselves.

I imagine that careful examination of those chains of thought, both by humans and by AI, will also be a very useful way to improve the reasoning ability of these models.

TomGally
Автор

What local front end are you using with Qwen coder at 16:25 ?

derekw
Автор

Question for you. when you're doing your strawberry test are you using strings or string literals. a "strrawberry" might be interpreted as something that the AI should spell check and use a dictionary for where a might be something that it takes as a literal and attempt the task differently?

rascalwind
Автор

Sam, I can’t help myself, but it seems to work better if I dynamically prompt exactly, what I need. The first iteration determines the “reply mode/format” and the second iteration brings the reply. The agentic “flow” is way cheaper as well.

MeinDeutschkurs
Автор

Has anyone else suspected the whole 'strawberry' thing comes from the shape of the monte carlo tree graph? It would be painfully on the nose if this were true... Maybe red for filtering by logical-nots/falsey values that consists of the initial nodes, then green for the truthy final leafs. A lot of our own reasoning is first stating what x definitely is-not, then we pick from likely candidates that remain - if we really don't know something.

Charles-Darwin
Автор

11:54 How many r's in "strrawberry" (typos intentional): "thought for 9 seconds"... This is progress in AI :) I so wish for tokenization to go away, then at least his would be a nonissue.

pallharaldsson
Автор

What are your thoughts why anthropic has not released a similar chain of thought model or architecture?

grabani
Автор

We just want to know if qwen still looping
Reminds Asimovs story of SPD-13

RaitisPetrovs-nbkz
Автор

My question is : I have downloaded the open source models, but without their secret prompt, the accuracy is no comparison to openAI models, even not as good as a regular Llama 3.1. 70b instruct model. Can any one tell me where is the secret prompt? Using deepseek or qwq website is no open source at all. No one knows what model is running in the backend

menglilingsha
Автор

The speed of open source development is promising, but the traditional accuracy benchmarks hide the importance of speed which is more critical for these inference-bound models.

Nice video highlighting the current ecosystem.

lucasjans
Автор

Need an agent to assign inference time per query.

jtjames
Автор

Cool comparison! However, there's no way I'm paying for overseas reasoning models when homegrown Open AI's superb o1 exists. Which I do pay for! And it's impressive even in its pre-release versions (mini & preview).

bokuboke
Автор

do we now get impressed when the model count R's in stawberrrry?

hqcart
Автор

The progress on these models is insane, people have no clue what's coming.

NowayJose
Автор

You decide to break up with your AI waifu.

AI: QwQ what's this???

undefined
Автор

Great content, as always! A bit off-topic, but I wanted to ask: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). Could you explain how to move them to Binance?

marshall-bi