Create Your 'Small' Action Model with GPT-4o

preview_player
Показать описание
Create Your "Small" Action Model with GPT-4o

👊 Become a member and get access to GitHub and Code:

🤖 Great AI Engineer Course:

🔥 Open GitHub Repos:

📧 Join the newsletter:

🌐 My website:

I try to create my own "small" action model based on Python and the GPT-4o API. Will it work? Lets find out

00:00 Small Action Model GPT-4o Intro
01:48 GPT-4o Action Model Code
05:54 Testing the Model
Рекомендации по теме
Комментарии
Автор

This is actually really impressive. GPT-4o watching you act and understands what is done, then writes code to reproduce it, which can then be run and automated.
Very clever flow, OpenAI should definitely hire you.

ShpanMan
Автор

Thanks, this is interesting. I was wondering about this as well and had a thought about adding log data of user interactions to give the model more telemetry. So it not just vision but also the actual logs of all the interactions happening in the background.

georgestander
Автор

Cool experimental project and idea 👍 The entire process can be scripted further to continuously store the most recent number of screenshots in 2-second intervals to VRAM using PyTensor, and a call can be triggered at any time with keyword through mic input or keys shortcut to send it to gpt-4o to retrieve the "reply last action script" and then automatically execute it to save time doing some mundane tasks👍👍

clumsymoe
Автор

So good to see you getting onboard the rabid r1. It's seriously going to change lives.Enjoyed the video man.

cyc
Автор

I can think of so many uses for this. Great work.

TTOnkeys
Автор

Very interesting. I think it could also be useful to provide it with the mouse positions between different frames.

To go further, we could create multiple actions and then implement a RAG that allows the model to choose the correct snapshot and execute it.

Thanks for this video.

ibrahimaba
Автор

I've been thinking recall and omni screenshots were ways to create large pratical data sets to train lams. Do you think that is what's happening? You seem to be doing a smaller version of this

Soft_Touch_
Автор

Great start. What's the GH url for subscribers?

carstenli
Автор

so where is the code for this project! looks fun

gnosisdg
Автор

Are there any local LLM's this might work with?

ewasteredux
Автор

Bro Plz create video for real time vision and response

NetHyTech
Автор

honestly more legit than scammer Jesse Lyu and RabbitR1 garbage hardware scam after his NFT game scam.

avi
Автор

Disclaimer for those who thinking of implementing this project, open ai GPT models are not free so you have to pay to let it the code run

Ahmad-ejfy
Автор

learning how to be data scientist 80% from u bro haha

futureworldhealing
Автор

How does it know where to click though? Does

darthvader
Автор

Humane and Rabbit watching this and raising another round of funding

kalilinux
Автор

the github is always the same repo btw itl be easyer tomake a new repo for each project and put project link in description

JNET_Reloaded
Автор

Hello sir can u recreate gemini vision fake demo in real life

lokeshart