Thew New 'Claude 3.5 Sonnet' Actually SHOCKED The Industry! - Beats Gpt4o

preview_player
Показать описание
Claude 3.5 Sonnet Revealed!

Links From Todays Video:

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
Рекомендации по теме
Комментарии
Автор

It's really remarkable. It's huge. It shows that when it comes to AI, literally everything can change in the blink of an eye.

daniely
Автор

This thing is so neutered it's not even funny. Can't ask anything even remotely controversial.
Asked for places that would be safe in case of nuclear war, and it told me that I should talk to a therapist and to practice relaxation techniques...

eugenes
Автор

This video is so shocking that I was shocked by it. Truly some great YouTube shocking material

darkhollow
Автор

Clause Opus is already good enough for my use cases. If Sonnet can increase the message limit, then that's a very real quality of life improvement.

carlkim
Автор

13min vid🥶 im stunned, shocked, pretty pretty pretty shocked

riptechnoblademinecraftkin
Автор

Thanks for a great description. I am amazed at the very human responses I get from 3.5. It even interpreted my multiple-sentence poetic analogy in perfect detail, understanding how each phrase was analogous to our topic. Truly amazing

tunahelpa
Автор

Sonnet 3.5 with 62% is at the level of a good amateur programmer

sephirothcloud
Автор

Wow, the car door notification chiming in the background is really actually shocking

charleslpayne
Автор

This thing past the test "write 10 sentences that end with the word "orange"

anta-zjbw
Автор

Anthropic needs to work on their API pricing and number of messages restriction in chat. It’s annoying, GPT4-0 is pumping out work 24/7 for relatively decent pricing and much cheaper than Claude.

OscarTheStrategist
Автор

Ive found from my tests that GPT-4o still is definitely the best for math questions. it gets them right more often and shows more of its work and shows it better and the webui for claude doesn't seem to support latex as well. for creative writing I was expecting Claude 3.5 to be better since Claude 3 opus is very human but I've noticed when it comes to sounding human and creativity Claude 3 opus is still to this day better than GTP-4o and Claude 3.5 sonnet so Ive found that this release of course is great because its free but if you're expecting a major super duper improvement or anything its not there ChatGPT is still probably better for most situations simply because it has more features. however, the magnetic capabilities shown in many of their demo videos could change this and make claude 3.5 better but I don't have access to it yet only the text model :(

pigeon_official
Автор

I'm absolutely shocked the video ended mid sentence.

fractal
Автор

Most relevant for RL is humaneval and GPQA benchmarks. Actually dope asf. Looks like the D riding is finally ending and labs are trying new ideas. You be surprised what type of performance gains you can get exploiting LOTS of test time compute per prompt(emulates larger model output), filter, and coupling with something like LiPO. Still a lot of easy hobblings out there as the kids say lol.

Cheers to them. Almost 60% on GPQA zero-shot is extremely impressive. I do hope companies include more revealing benchmarks. Considering error. Humaneval and a few other benchmarks used for to promote model releases, are almost completely saturated and damn near meaningless.

alexanderbrown-dgsy
Автор

just tried it with a specialized prompt of mine and follow up questions that no model could solve yet properly, not even gpt4o. and every time i did this, the result was that my worries about ai taking over humanity were eased. let me tell you, i am worried now. this is no joke anymore. this is getting creepy. it begins now. and its only june. and this is only their sonnet model. boy oh boy are we in for a ride.

peterkonrad
Автор

Subscribed to pro instantly. The artifacts is so useful for the makers among us. Upgrade of the year I find it

koen.mortier_fitchen
Автор

This entire experience is blowing my mind.

HexylvaniaFilms
Автор

I am critical of benchmarks these days, as benchmark data can accidentally be leaked into the training data of the models. One might better wait for the chatbot arena leaderboard to get a first hint of how good the model might be.

The model might be very good; however, one should also be careful when interpreting graphs with no tick marks, especially when an undefined quantity like 'intelligence' is presented on the y-axis. 😁

OmicronChannel
Автор

really wasn't expecting this news. indeed it is shocking

elon--musk
Автор

I'm so shocked that I didn't even watch the video.

ZipZapTesla
Автор

Very very, pretty pretty and really really cool

robertonery