GPT-4o Low Latency Screen to Voice Tutorial - SUPER IMPRESSIVE OCR!

preview_player
Показать описание
GPT-4o Low Latency Screen to Voice Tutorial - SUPER IMPRESSIVE OCR!

👊 Become a member and get access to GitHub and Code:

🤖 Great AI Engineer Course:

🔥 Open GitHub Repos:

📧 Join the newsletter:

🌐 My website:

Today we recap my livestream where i built a low latency screen to voice reader with great ocr capabilites. This will look at the screen, answer any question or explain a problem, with pretty low latency pre new voice mode from GPT4o.

00:00 GPT4o Screen to Voice Intro
00:57 GPT4o Flowchart
01:42 Lets Build The Screen Reader
06:05 First Test
07:05 Lets Build The Voice
09:48 Second Test with Voice
10:32 Adding Control Key
11:05 Final Tests
Рекомендации по теме
Комментарии
Автор

Legit shit. A real coder pwning the Ai matrix❤.

Ms.Robot.
Автор

Cool. You projects are always amazing. The local open source projects are the most amazing and interesting to me.

BThunder
Автор

Pretty cool project idea. If you don't mind, I stole it and use Gemini Flash to analyze the images; it's pretty fast too. You should try it.

choff
Автор

I need of tech like that for my desktop virtual 3d assistant.

I have a 3d model of a character (AI agent) that has to interact with computer in many interesting ways up to controlling pixels of the screen by itself, for example if it want to impose a an object to interact with virtual space. I hope soon enough we will have enough speed and power for AI agents to be sentient and working seamlessly with any type of information.

ksem
Автор

Im from Portugal, the portuguese is a mixture of mostly Portuguese from Brasil and a lil bit of Portuguese from Portugal heheh
Spanish is not my primary language but it is not that bad also !

pedrorafaelnunes
Автор

This is great. Can we add voice prompt?

enthuesd
Автор

Being a member I have been trying to access the github repo, I have sent multiple emails to the provided email address, yet to receive a response it has been 48hrs. Please advise.

protimaranipaul
Автор

🎯 Key points for quick navigation:

00:00 *🖥️ Overview of the project setup*
- Setting up for screenshot analysis using GPT-4o
- Detailing the low latency approach for image understanding
- Collecting documentation and writing the initial iteration of the script
02:18 *🛠️ Implementing functions and configurations*
- Fetching documentation from OpenAI for implementing GPT-4o with image inputs
- Inclusion of functions from prior projects to streamline the process
- Utilizing EnV files to fetch the OpenAI key for configuration
07:21 *🔊 Integrating text-to-speech functionality*
- Obtaining OpenAI documentation for speech-to-text-to-speech functionalities
- Implementing a feature to read out responses using TTS
- Troubleshooting and fixing errors in the TTS APIs and configuration
10:55 *🎛️ Controlling the main function with a trigger key*
- Adding a feature to control the main function trigger using a key command
- Testing the control setup with screen prompts for AI responses
- Demonstrating the capability of the system to respond effectively with controlled triggers

Made with HARPA AI

PTHastings
Автор

is there a copy of the code you used in the documentation you sent to OpenAI in your first prompt?

-deez
Автор

Do you know when will be having an access to gpt 4o voice api

branislannjemec
Автор

Hey how do i get access to git and discord?

abhishekrakhe
Автор

Spanish isn't really Spanish if it's speaking with an US accent...

luisvictorf