NEW GPT-4o Vision API: Best Way to Copy Text from Image (OCR in Python)

Показать описание

OpenAI has released a new model, GPT-4o with Vision capabilities built right into its API. It is advertised as more accurate, faster and half the cost of the vision capabilities in the previous model. In this video we put that to the test, and try out using a python script to extract text off invoices (even handwritten ones). Also, I will show some tricks to get consistent output from the API for different types of images.

GitHub Link to Starter code:

Рекомендации по теме

Комментарии

Totally agree with the Llama comment at the end: every company is going to want to build their own model (trained on basic open source libraries data and with their own data on top of it). I still struggle with understanding how that new world will look like... A bunch of "Jarvis" everywhere? Can you make a video of what you think interacting with that new internet ecosystem may look like?
Thanks!

pjgilcunha

thank you for the video- GPT4o is my default model at the moment - but I test other LMMs as well -)

micbab-vgmu

Hey, I was doing the same thing before i found your video but adding response_format was a great help. Thanks!
Now i am finding a way where i send multiple images to gpt4o and get an indicator if image is rotated(it does not work on rotated images) now when it comes to multiple images i need an identifier of them to rotate required image only, Do you have anything in mind?

rajmandaviya

Thank you so much!
I still encounter some issues like I'm uploading an Invoice and every time it gives different vendor name(upper case, lower case) and how to mention date format in JSON Schema? it always return different format. how can I prompt this?

devamsanghavi

Hi. Do you know the limit of tokens i can use? Im trying to transcribe an image with a lot of text, but it it stops in the middle. it seems the maximum of tokens i can use is around 1000.. How can i set more tokens per request?

danielalbano

Hello!

Thank you for the insightful video. I am currently working on a side project using GPT-4 to extract handwritten text from paper. However, since handwritten text varies greatly and some handwriting can be very difficult to read, there are occasional extraction errors that could affect the product's credibility.

I am considering implementing a method where (1) the confidence level of each extraction is assessed, and (2) if the confidence level falls below a certain threshold, (3) the result is marked as N/A or skipped. However, I am in first step using GPT to make product, as I am a product manager, not a developer. Do you have any advice on how to handle this issue?

Thank you once again for your helpful video.

Best regards,
From Korea

JAYJang-mezh

Hi,
Thanks for this video on using the GPT-4o Vision API. I'm using the code shown to detect text in images, and it's working very well. However, when I request the pixel coordinates for sections of the invoice (general information, product details, and payments), the accuracy is not very good.

Could you provide some advice or demonstrate how to improve the accuracy of the pixel coordinates for each section in the image? I need to locate specific areas like the invoice number, client information, tax ID (CIF or NIF), product details, and payment information such as the total amount and VAT.

Thanks in advance for any help!

Eric

I tried to use the vision functionality, but unfortunately sometimes it invents the numbers and even if I force it in the prompt it doesn't do it :(

xmagcx

NEW GPT-4o Vision API: Best Way to Copy Text from Image (OCR in Python)

NEW GPT-4o Vision API: Best Way to Copy Text from Image (OCR in Python)

Live demo of GPT-4o vision capabilities

GPT-4 Vision API :10 NEW MINDBLOWING Abilities + Examples

GPT-4o - Full Breakdown + Bonus Details

26 Incredible Use Cases for the New GPT-4o

Interview Prep with GPT-4o

Interview roleplay with GPT-4o voice and vision

Live demo of GPT-4o coding assistant and desktop app

I automated ChatGPT (here's how)

New ChatGPT Model is here and it’s GOOD - GPT-4o Mini Review

Say hello to GPT-4o

Another glorious battle for AI dominance… GPT-4o vs Google I/O

Be My Eyes Accessibility with GPT-4o

Dog meets GPT-4o

Sarcasm with GPT-4o

Realtime Translation with GPT-4o

NEW GPT-4o: My Mind is Blown.

GPT-4 Vision API: Best Way to Copy Text from Image (OCR in Python)

New GPT-4o Voices & More AI Use Cases

Math problems with GPT-4o

Two GPT-4os interacting and singing

NEW GPT-4o: Top 7 Mindblowing Use Cases (Its FREE 🤯) | OpenAI ChatGPT-4o How To Use

GPT-4o talking to GPT-4o

Character voices with GPT-4o voice