Can OpenAI's o1 solve complex medical problems?

Показать описание

First thoughts and preliminary insights into OpenAI's GPT o1 Strawberry in the medical domain, with some expected and unexpected findings. We have a "bake off" between o1 and Doc to demonstrate how o1 fares with tricky medical scenarios

Disclaimer - obviously don't use AI to diagnose or treat your medical problems, if you are unwell please seek a medical professional (AI isn't good enough just yet :)).

👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)

00:00 start + highlights
1:28 intro, what is GPTo1
5:18 what is "reasoning" in o1
12:38 Benchmarks- o1's successes and failures
24:07 O1 and doctor bake off!
24:21 The pregnancy acid test for LLMs
26:23 clinical coding
30:06 Tricky patient scenarios
32:25 opioid dose conversions

Рекомендации по теме

Комментарии

A tip: start a new chat for each of the questions. It will likely respond better then, as it uses all of the previous questions as context, and quite heavily so.

satioOeinas

I created a simple website, with anthropic API keys, took me a couple of hours. You enter a patients information, their history, and symptoms, it returns possible diagnosis(s) and a patient specific treatment plan.

My cousin who is in med school stress tested it and she was like omg how did you make this, its amazing, and I was like its just a wrapper haha

arnavprakash

Great talk. Interesting to see how AI is helping the wider medtech industry.

Just a small tip. Always try to use fresh sessions when asking unrelated questions. Us humans have a remarkable ability to ignore the past and move on to the next problem in the set, but LLMs will analyse the entire history prior to marking them as irrelevant (even with the initial message indicating that it's a quiz). As a result, accuracy and precision drops the deeper you go into the conversation.

bombala

Great video! I’m also excited for o1. I gave it 350 records to sort and analyze, and it did the same work in 20 seconds what would have taken me 3 hours in Excel. Very impressive

jd_real

The full model should be arriving next month, would be interesting to give it even harder tests.

ShpanMan

The problem with ARC puzzle Is that substantially It Is a visually reasoning task. When you translate It into a matrix you are not testing the same thing as for humans. I think LLM will only get better at this task improving the vision capabilities, not only the reasoning ones. And with this I won 1 million dollars :-)

LucaCrisciOfficial

Also, many people were accomplishing this "reasoning" by using RAG processes and making multiple api calls to both hold the model's hand through the reasoning process and also as a way to confirm results. Supposedly, much of this won't be necessary, if it delivers on its promises.

I'd like to see the model come back with requests for clarification or additional information.

sevilnatas

It is nice to see real experts tests this model and not relay on OpenAI internal testing or random tech youtuber.

chickendinner

Still waiting for full version of gpt o1

AlfarrisiMuammar

I don't get what the apparent error was in the last case (?). It stated it as "approximate" after all 🤔

djayjp

Minor pedantic correction: it isn't OpenAI GPT o1, it's just OpenAI o1. Sam doesn't like the name GPT. The o1 series is a fresh start without the GPT.

human_shaped

Slight correction, it's not called GPT o1 just Open AI o1. But great content and very scary.

EamonnMooney

wait till next year, this is just the start.

michaelhartjen

audio wouldnt work for this models, unless people want to wait for an answer.

andreaskrbyravn

Proprietary and Open Source is not supposed to exist at the same time. The origin of OpenAI was as a non-profit and supposed to be both open source, safe and benefiting society. Seems that is all out the window now.

sevilnatas

Well, I don't know what to say, other than I am glad I have no children. I feel sorry for those who do. Looks like a cold and dark "future" for them

xXstevilleXx

it will aid the depop movement greatly. get your heads out of the sand

standingbear

Can OpenAI's o1 solve complex medical problems?

OpenAI’s new “deep-thinking” o1 model crushes coding benchmarks

Why and When You Should Use OpenAI o1

Is OpenAI's latest o1 model that good?

GPT-o1: The Best Model I've Ever Tested 🍓 I Need New Tests!

OpenAI o1 EXPLAINED: Everything You MUST Know, WHEN to Use It & Its Limits!

Explaining OpenAI's o1 Reasoning Models

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

Claude vs GPT vs o1: Which AI is best at programming? | Cursor Team and Lex Fridman

'Training' an AI Agent for ONE Specific TASK with OpenAI-o1 API

So Google's Research Just Exposed OpenAI's Secrets (OpenAI o1-Exposed)

Building OpenAI o1 (Extended Cut)

Can AI solve complex riddles? (GPT-3)

OpenAI Releases GPT Strawberry 🍓 Intelligence Explosion!

'OpenAI o1: Revolutionary AI Model Outperforms PhD Scholars in Science Tests!'

Cursor AI: Best AI Code Editor + OpenAI o1: Create Apps in Minutes

OpenAI o1 Model in Zapier for Beginners - [Trailer]

AI says why it will kill us all. Experts agree.

AI can't cross this line and we don't know why.

Build Anything with AI Agents, Here's How

Master the Perfect ChatGPT Prompt Formula (in just 8 minutes)!

The Possibilities of AI [Entire Talk] - Sam Altman (OpenAI)

Do androids believe in God? Watch our interview with Ameca, a humanoid #robot at #CES2022 #Shorts

Can ChatGPT O1 Make Me Money?

Major ChatGPT Upgrade! | 'Canvas' AI Features HANDS ON