OpenAI GPT-4o API Explained | Tests and Predictions

preview_player
Показать описание
OpenAI GPT-4o API Explained | Tests and Predictions

👊 Become a member and get access to GitHub and Code:

🤖 Great AI Engineer Course:

🔥 Open GitHub Repos:

📧 Join the newsletter:

🌐 My website:

Explaining the OpenAI GPT-4o API. My predictions and some tests of what I think we can expect from GPT-4o API and the multimodal model.

00:00 OpenAI GPT-4o API Intro
03:23 OpenAI GPT-4o Explained
06:47 OpenAI GPT-4o Exploration
Рекомендации по теме
Комментарии
Автор

It's going to be a game changer if OpenAI can actually deliver all the functions they demonstrated.

elliotanderson
Автор

I just realized that new openai model killed your voice assistant projects, just as they did last time with GPTs

Ginto_O
Автор

What we need ASAP is an open source alternative to GPT-4o realtime speech-to-speech (as in demos). I'm pro open-source and I want full control of the application flow, preferably offline. Has anyone tried to use XTTS streaming capabilities succesfully, for example by extending AllAboutAI-examples?

mwkoti
Автор

4:59 It's not horribly wrong, but I'd combine the <USER VOICE> and Voice IN in one graphic in the GPT-4o Voice API Now section and Voice OUT under the LLM RESPONSE. GPT-4o is going to be a game changer for education.

Ouyk
Автор

Hi Kris can you do it vision version also with camera ? Some things or help usecases

YunusDogan-yclx
Автор

I wounder how they manage to handle interuptions during the voice output like in their demo for the api

elliotnyberg
Автор

This was very educational. Your instructions were clear and concise. ❤🎉

Ms.Robot.
Автор

That "Performance Scores" table - nice! If that is all correct then that's pretty impressive.

Though I did a screenshot test myself and it mistook a 3 for an 8 so it might not be flawless.

OliNorwell
Автор

I think you're right. Maybe minutes after OpenAI presentation was done, I posted on their developer forum if voice in / voice out will be available to developers soon. They said only to a small group of "trusted" partners. So yea, I'm not sure when we gonna get access to this. You gotta be in that special circle. 😅

alirezasheikh
Автор

Does the streaming audio function help with the latency?

ionutownprint
Автор

multimodal or multimodel ? does really anyone believe it’s a single model ?

squiddymute
Автор

if you could give it IQ tests visually ! that would be great since this is difficult for it like how many triangles or what is the next humber in a series..

ziadnahdi
Автор

Hi, great video! Iam courios how much do those apis cost. On their website I found text pricing in tokens and it is pretty cheap and understandable. However the image or “vision” function seems to be so expensive. I calculated it and with full hd on low settings it is going to cost about 5$ for a minute on 15fps. That’s crazy, not even mentioning it their TTS, that costs 15$ per 1M tokens, which is pretty hilarious

athemis
Автор

I made a script just like yours and now with GPT-4o it kinda defeats the purpose...😅at least we can still use it with local models.

RaysAiPixelClips
Автор

My thoughts that we already have a good chance to be so so devs and to write perfect code 🎉😅

learnwithyan
Автор

GPT5 may finally be able to tell how many r's are in the word "strawberry", but 4o will suffice with its ability to write a bigint from scratch in C in just a couple minutes of telling it to try again

GamezeveR
Автор

god the number of men who will fall in love with that voice hahaha

rodrigov.
Автор

why does the personality of the voice need to highlight the worst type of female personality type ie valley girl who laughs at everyone and everything for no reason tho ?

babaksard