GPT-4 Vision API: Best Way to Copy Text from Image (OCR in Python)

Показать описание

OpenAI recently released the GPT Vision API allowing developers to use the amazing vision analysis capability available inside ChatGPT plus. I wanted to test the results of doing text extract from a picture of a form to see how accurate the OCR capabilities were. Also, how well it structured the data file that was outputted. As you will see in the video, the results were very impressive.
Link to project in GitHub:

Рекомендации по теме

Комментарии

Great stuff, and cool ideas at the end!

jonathanvandenberg

Lovely. Really inspiring. With little knowledge of Python, I managed to do this. Seeing only 5.1k views in 3 months make me happy as it looks like not so many people are interested in this subject ;-) a lot of business opportunities

maciejlegowicz

Super impressive capabilities these days.

kevon

I had the exact same idea today, especially with the functions calling. Did you manage to get that to work? Cool video, btw🎉

benjaminsaladin

This API is great - thank you for the video. I wonder if it is able to recognise hand writing in diffrent languges than ENG.

micbab-vgmu

Great video! Please do a tutorial on how to convert scanned pdf files in a Google Drive folder to Excel using GPT-4 Vision. Thanks!

Great_Muzik

In my case it cannot recognize some characters. He confuses the 2 with Z, 6 with G, etc. It happens only in lines of random characters like (G12300HO). I don't even know how to teach it. I have set the temperature to zero.

vadymivanenko

I'm curious what are strategies for parsing born digital pdfs, the data is already there so it just needs to go and grab it without ocr right? How would that work?

kirk

Hey! I have an excellent use for this to help small business but have no idea how to make it work. Could we talk about it to see if it's something that could be done?

glowmarkdesigns

If an image is not good quality, but readable for human, it can recognize a text with mistakes. Tested on estimates that was sent via Whatsapp.

i.am.rossalex

Great video. Quality appears to have degraded heavily since this video. Sometimes it outright refuses to scan images containing names as they're personal information

pabloenzozanitti

Nice video btw. Sorry i do not share your excitement as I tried with more beefy images, like engineering drawings like P&ID, Location or connection diagram. I am able to get some information when I zoom the images to a certain level, but full scale GPT tell me to use a OCR software. Also, you cannot get the bonding rectangles of your text for further processing with html and css. So I will stick with Google Vision API for now to do this, i guess it is less expensive than GPT anyway offering a free tier of 1000 images/month and much faster.

DuneKraftwerk

Hi there, I have tried to implement something similar and I get the response saying things like this:

I'm sorry, but I am unable to access external links or view images, so I cannot analyze the image or read any text from it. My capabilities are limited to processing and generating text-based information. If you can provide the text from the image, I'd be happy to help analyze or discuss it with you.

However, if I use GPT4 chat window as normal, upload my invoice, it can read it no problem.

Have you came across this?

tommydavies

Hi, is it necessary to pay to get access to this API.

adammessier

'message': 'The model `gpt-4-vision-preview` has been deprecated

uplifthabesha

Will it work for bad hand written text ??

abhishekgaikwad

this is free api from chatgpt? or pay?, i cant run this program

kurniadrajat

Does it also work with PDF files instead of .img ?

SunilSamson-wl

Hi, very interesting video ! Watching from Paris, France ! I'm currently developing a solution based on that, is it possible to talk together briefly on messenger or something else ?

remisanlis

GPT-4 Vision API: Best Way to Copy Text from Image (OCR in Python)

GPT-4 Vision API: Best Way to Copy Text from Image (OCR in Python)

GPT-4 Vision API :10 NEW MINDBLOWING Abilities + Examples

How to Build App with OpenAI's New GPT-4 TURBO VISION API (gpt vision)

NEW GPT-4o Vision API: Best Way to Copy Text from Image (OCR in Python)

I figured out what GPT-4 Vision could do

GPT4 vision API Python Tutorial in to get you started

Web Scraping with GPT-4 Vision AI + Puppeteer is Mind-Blowingly EASY!

GPT-4 Vision API 🚀 The Future of Image Recognition! 🤯 Step-by-Step Tutorial

Build an AI Image Captioning App With GPT-4 Vision API in 3 Min

7 GPT-4 Vision API SaaS Ideas (Start building now!!)

GPT 4 Vision (PYTHON) Tutorial for Beginners

EASIET Way to Install LLaVA - Free and Open-Source Alternative to GPT-4 Vision

Generate Apps from Sketches or Screenshots with OpenAI GPT-4 Vision API (6 mins quick demo)

GPT-4 Vision: 10 Amazing Use Cases - This is HUGE!!

GPT-4 Vision: A Comprehensive Tutorial | GPT-4V

OpenAI Vision API Tutorial: Build a Fullstack Website to Chat With Images

GPT-4 Vision API + Puppeteer = Easy Web Scraping

Cohere Command R+ API / GPT-4 Turbo Vision API Update - Impressive!

WEB SCRAPPING Using CHATGPT | How To Use GPT 4 Vision API To Automate Web Scrapping | Simplilearn

Build copilots with VISION | GPT-4 Turbo with Vision + Azure AI

Iphone + GPT-4 Vision API = Autonomous Security Cam System

Assistant API with GPT-4 Turbo Vision: OpenAI's Complete Guide to Integration

GPT-4 Vision API 🤯 INSANE Video Recognition Powers! Step-by-Step Tutorial 🚀

GPT4 Vision API with no code | BuildShip Nodeverse