GPT-4 Vision API: Best Way to Copy Text from Image (OCR in Python)

preview_player
Показать описание
OpenAI recently released the GPT Vision API allowing developers to use the amazing vision analysis capability available inside ChatGPT plus. I wanted to test the results of doing text extract from a picture of a form to see how accurate the OCR capabilities were. Also, how well it structured the data file that was outputted. As you will see in the video, the results were very impressive.
Link to project in GitHub:
Рекомендации по теме
Комментарии
Автор

Great stuff, and cool ideas at the end!

jonathanvandenberg
Автор

Lovely. Really inspiring. With little knowledge of Python, I managed to do this. Seeing only 5.1k views in 3 months make me happy as it looks like not so many people are interested in this subject ;-) a lot of business opportunities

maciejlegowicz
Автор

Super impressive capabilities these days.

kevon
Автор

I had the exact same idea today, especially with the functions calling. Did you manage to get that to work? Cool video, btw🎉

benjaminsaladin
Автор

This API is great - thank you for the video. I wonder if it is able to recognise hand writing in diffrent languges than ENG.

micbab-vgmu
Автор

Great video! Please do a tutorial on how to convert scanned pdf files in a Google Drive folder to Excel using GPT-4 Vision. Thanks!

Great_Muzik
Автор

In my case it cannot recognize some characters. He confuses the 2 with Z, 6 with G, etc. It happens only in lines of random characters like (G12300HO). I don't even know how to teach it. I have set the temperature to zero.

vadymivanenko
Автор

I'm curious what are strategies for parsing born digital pdfs, the data is already there so it just needs to go and grab it without ocr right? How would that work?

kirk
Автор

Hey! I have an excellent use for this to help small business but have no idea how to make it work. Could we talk about it to see if it's something that could be done?

glowmarkdesigns
Автор

If an image is not good quality, but readable for human, it can recognize a text with mistakes. Tested on estimates that was sent via Whatsapp.

i.am.rossalex
Автор

Great video. Quality appears to have degraded heavily since this video. Sometimes it outright refuses to scan images containing names as they're personal information

pabloenzozanitti
Автор

Nice video btw. Sorry i do not share your excitement as I tried with more beefy images, like engineering drawings like P&ID, Location or connection diagram. I am able to get some information when I zoom the images to a certain level, but full scale GPT tell me to use a OCR software. Also, you cannot get the bonding rectangles of your text for further processing with html and css. So I will stick with Google Vision API for now to do this, i guess it is less expensive than GPT anyway offering a free tier of 1000 images/month and much faster.

DuneKraftwerk
Автор

Hi there, I have tried to implement something similar and I get the response saying things like this:

I'm sorry, but I am unable to access external links or view images, so I cannot analyze the image or read any text from it. My capabilities are limited to processing and generating text-based information. If you can provide the text from the image, I'd be happy to help analyze or discuss it with you.

However, if I use GPT4 chat window as normal, upload my invoice, it can read it no problem.

Have you came across this?

tommydavies
Автор

Hi, is it necessary to pay to get access to this API.

adammessier
Автор

'message': 'The model `gpt-4-vision-preview` has been deprecated

uplifthabesha
Автор

Will it work for bad hand written text ??

abhishekgaikwad
Автор

this is free api from chatgpt? or pay?, i cant run this program

kurniadrajat
Автор

Does it also work with PDF files instead of .img ?

SunilSamson-wl
Автор

Hi, very interesting video ! Watching from Paris, France ! I'm currently developing a solution based on that, is it possible to talk together briefly on messenger or something else ?

remisanlis