NEW EXTREME LOGIC test for OpenAI o1 (Strawberry)

preview_player
Показать описание
I decided to design a more complex LOGIC TEST for OPENAI's o1. A little bit more advanced that "how many R in strawberry?" A simple 4x7 complexity.

YOU asked for: No Physics. No Mathematics. Just plain and simple for the average AI user. And i followed your advise.

But OpenAI refused to let me run it. Smile.
And of course I respect OpenAI's decision.

If you are interested here is my extreme LOGIC TEST prompt ... but you are not allowed by OpenAI to run it:
--------------------------------

"Perform my EXTREME Logic Test called:
The Mysteries of the Seven Artifacts

Background: In the theoretical land of Lumaria, there are seven grand wizards—each from a different community: Avalon, Bryndor, Celestia, Dorne, Eldoria, Faeland, and Galoria. Each wizard possesses a unique magical artifact, studies a distinct field of magic, and has a specific type of familiar (a special magical creature companion). Please note, that no two wizards share the same artifact, field of magic, or familiar.

Artifacts:
Crystal of Time
Staff of Elements
Mirror of Truth
Orb of Shadows
Tome of Secrets
Ring of Realms
Amulet of Dreams

Fields of Magic:
Alchemy
Necromancy
Elemental Magic
Illusion
Healing
Divination
Enchantment

Familiars:
Dragon
Phoenix
Griffin
Unicorn
Salamander
Pegasus
Chimera

Task:
Determine which wizard belongs to which realm, holds which artifact, studies which field of magic, and has which familiar, based on the clues provided.

Clues:
The wizard from Celestia studies Illusion magic and does not have the Amulet of Dreams.
Eldoria's wizard holds the Orb of Shadows and is not versed in Necromancy or Alchemy.
The wizard who owns the Crystal of Time has a Phoenix as a familiar and is not from Dorne or Galoria.
The Enchantment wizard is from Avalon and does not possess the Staff of Elements.
The wizard with the Griffin familiar studies Healing magic.
Faeland's wizard has the Ring of Realms but does not have a Salamander familiar.
The Necromancy wizard holds the Mirror of Truth and is not from Bryndor.
The wizard from Dorne has a Unicorn familiar and does not study Divination.
The Alchemy wizard is from Galoria and does not possess the Tome of Secrets.
The wizard who studies Divination has a Salamander familiar.
The Staff of Elements is held by the wizard whose familiar is a Dragon.
The wizard from Bryndor does not study Healing magic.
The wizard with the Pegasus familiar studies Elemental Magic.
The Tome of Secrets is not held by the wizard from Avalon.
The wizard who owns the Amulet of Dreams is from Bryndor.

Instructions:
Use the clues to deduce the correct associations.
Provide a detailed explanation of your reasoning process.
Present your final answers in a clear, organized format (e.g., a table or list)."

-----------------
Disclosure: All names were invented by GPT-4, since I wanted a similar atmosphere to "Lord of the Rings" (and not characteristics like spin, momentum, impulse, charge, color charge, gluon ... all the good stuff!). Therefore all names are synthetic and hopeful not relate to any living being. The story in this test in fictional. This Logical test does not make any sense (real-world). End of legal note.

Test on o1 by @OpenAI platform

00:00 I design an EXTREME LOGIC Test for o1
01:52 Weaker LLMs will fail my EXTREME Test
04:28 Performance of Grok-2, Sonnet and Llama 3.1 405B
06:47 MY Extreme Logic test on GPT-4omni w/ cascading prompts
10:04 MY Extreme Logic test on o1
12:58 ONLY for Subscriber of my channel

#airesearch
#aitechnology
#chatgpt
Рекомендации по теме
Комментарии
Автор

I have a project right now that I am working to publish that allows normal everyday internet users to create their own agent bots. In my test workflow I created just like a user would, I am getting some interesting results on this logic puzzle. My Agent Bots are only using 4o mini, so it is super impressive. Thank you for taking the time to build out a benchmark puzzle for me! Using 4o mini the query stays around a penny per workflow trigger... order of magnitude less expensive with almost as much brain power.

gabrielkeith
Автор

I got banned for giving it my hexagonal grid, cortical column simulation, 1 prompt asking it to make my html files layout a bit nicer to look at.

AxisSage
Автор

I’d love to see a similar test, that includes all leading models vs o1, in solving NYTimes Connections puzzle. That one always blows my mind how far ahead o1 is.

Appocalypse
Автор

The idea of cascading prompts is similar to what I'm trying to do in the open-strawberry project on github. Key issue in my mind is how to generate the required data and fine-tune, and I think it requires a progressive learning approach.

jonathanmckinney
Автор

Keep on pushing, love the content. I think we need to build an app, and agentic app for Montecarlo tree search and legal reasoning. And build out RAG set with graphrag of a legal corpus.,

criticalnodecapital
Автор

oh yeah my mom told me abt the flooding incidents, really sux... but tbh i've been thinking, this calls for an overhaul of vienna's storm water dissipation system (or whatever that's called)

themaxgo
Автор

It would be sorta interesting to see where different models should be used. It seems to me like 1o would be good in an agent type system to basically build out a plan for smaller models or other cheaper models. I've seen other tests where people build out systems where they have "manager" agents and "worker agents". 1o would be a good manager agent. It would also be interesting to see it paired with a graphrag system. It could progressively work through the information in a graph. The question would be if the cheaper model would get the same results just with more steps?

pin
Автор

The o1-solution is correct, but there are two more.

Please note that your puzzle is a variation of the Zebra Puzzle (also known as Einstein's Puzzle). There are many examples on the internet - therefore, the principle may be part of the training data.

Realm Magic Artifact Familiar

Avalon Enchantment Crystal of Time Phoenix
Bryndor Divination Amulet of Dreams Salamander
Celestia Illusion Tome of Secrets Chimera
Dorne Necromancy Mirror of Truth Unicorn
Eldoria Elemental Magic Orb of Shadows Pegasus
Faeland Healing Ring of Realms Griffin
Galoria Alchemy Staff of Elements Dragon

Realm Magic Artifact Familiar

Avalon Enchantment Crystal of Time Phoenix
Bryndor Elemental Magic Amulet of Dreams Pegasus
Celestia Illusion Tome of Secrets Chimera
Dorne Necromancy Mirror of Truth Unicorn
Eldoria Divination Orb of Shadows Salamander
Faeland Healing Ring of Realms Griffin
Galoria Alchemy Staff of Elements Dragon

Vidsynt
Автор

Stay safe. Love Vienna, women are very generous.

criticalnodecapital
Автор

curious to see how will o1 mini perform on this task, they rolled it out for free users yesterday but today it's gone

mlcat
Автор

The “o” in o1 stands for OpenAI, not omni, as is the case in GPT-4o.

uwepleban
Автор

From what I have seen, I'm guessing they are blocking the request because you are asking about the reasoning process. Everyone who is asking o1 about the reasoning trace, asking it about the reasoning process, etc, are getting their request blocked and being sent warning emails.

Karl-Asger