First Look At GPT-4 With Vision

preview_player
Показать описание

Making this video was quite a rollercoaster! From Dall-e 3 not yet been releaed, to confirmed multi-modal GPT-4 release, I cannot believe I have hijacked such a funny timing.

Special thanks to bruhmoment for providing me the Bard results, and Raphael for BeMyEyes access

This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO

Рекомендации по теме
Комментарии
Автор

Making this video was quite a rollercoaster! From Dall-e 3 not yet been releaed, to confirmed multi-modal GPT-4 release, I cannot believe I have hijacked such a funny timing.

bycloudAI
Автор

Just wanted to say, you're like the only AI 'tuber I've seen who isn't full of "THIS IS SO HYPE" and scammy vibes, or overly simplified tutorials. Awesome stuff man, good editing as well.

Clybius
Автор

Future Image captioning for datasets is going to be absolutely insane!

nilaier
Автор

Finally a life changing innovation that comes from using AI

L_QTx
Автор

I wonder if it can help out with electrical circuits

itsbalanse
Автор

I think OpenAI started rolling out the image feature already on it's own platform for plus users

Taireyn
Автор

when i see how accurate it can describe random peoples rooms, i cant help but thinking:
with this we finally solved the problem of how to automatically transform our vacuum robot enabled mass surveillance data into an easily searchable format 😅

LostMekka
Автор

if this could be fitted into specs it will become Jarvis level technology, we all could become Iron man

ChandravijayAgrawal
Автор

OpenAi just tweeted about vision coming to chatgpt

amallukose
Автор

I've just discovered this channel today after searching for a good AI news coverage channel. Great content overall.

My suggestion would be to slow down a bit and maybe provide more in-depth as well as simple explanations for some of the concepts. You go through a lot of details quickly and it's kind of hard to follow at times(maybe not this video specifically, but previous ones definitely suffer from information overload), more background information and context would be helpful for viewers who are new to the topic. Other than that, keep up the good work. Looking forward to more.

acousticdoodling
Автор

I've been researching the multimodal LLM's field for a while, and I have an idea why opensource models perform poorly compared to GPT-4. Most of the models are based on augmenting LLM's with vision transformers, such as CLIP (EVA) or pure VIT and they are very simple models that can operate only with 336x336 images at max. So i think that they aren't able to distinguish text and labels because the letters are compressed to just a blob of pixels that even human cannot recognize

why_we_still_here-wq
Автор

AI oops. At 3:25 the "assistant" wrongly says, "When about to land, pull the brake on right." But the brake is on the left under the pilot's left hand. Specifically this is the speed brake, which at constant airspeed controls the angle of descent. (Also, while rolling out pulling fully against the backstop at varying pressure applies the wheel brake to that amount.)

lonlipscomb
Автор

You're a legend man, keep on uploading

TopCuby
Автор

on one hand, it is super impressive how much can be done within the current paradigm and with what level of precision, but on the other - don't you also feel like the promises of AGI and something that transcends 'use huge datasets to train transformer models to imitate said datasets and then further finetune and modify them to make them perform specific tasks that fall within the logic of those datasets' seem just as far off as they did 8 months ago? or do you think that the exponential curve is real after all?

YUTPIA
Автор

What about audio? Have any of the LLM been pointed towards automatically translating speech-recording to other languages?

bennguyen
Автор

Wow. if only this API was released to the public...

le
Автор

I'd like gpt 4 to be prompted to create a randomised infinite sequence of visual prompts that are fed into dall-e 3 so that there is a constant output of random images in high resolution.

TheAkdzyn
Автор

That cool and but can it tag correctly those NH and danbooru works compared to some of those lazy posters :v ?

sharpcircle
Автор

"Serval boxes of computer parts sitting on a table" seems pretty satisfying for me.
I'm pretty tech oriented and I still had to squint to know what half of those boxes were all about lol :v
Their quite niche items so I don't blame an AI if he's at least able to at minimum figure out what is represented in general.

sharpcircle
Автор

imagine being one of the patreons shouted out at the end of the video...

tatacraft
join shbcf.ru