Automate OCR with GPT-4o and Power Automate

Показать описание

Welcome to today's tutorial where I will demonstrate how you can learn how to extract values / entities from an image using a single api call and the new ChatGPT-4o Omni model. Using Power Automate I will demo how you can call the API on Azure. You will learn how to deploy the model to Azure and how to fine tune a prompt for a GPT use case. I will then demonstrate how you can run your automation on files uploaded to SharePoint, saving key metadata straight back to the file properties in SharePoint, all using Power Automate.

Whilst there are off the shelf models for OCR extractions in the form of AI Builder and Azure Document Intelligence, this new GPT-4o model is remarkably cheap to run and extremely flexible due to the ability for you to formulate a natural language requirement. By insisting the model returns structured JSON, we have a consistent output and can use this consistently in further stages of our automation.

Timestamps:
00:00 Introduction
00:45 Power Automate Flow and GPT-4o
03:42 Upload a file and trigger the automation
06:00 Deploy a model to Azure
07:25 Azure AI Playground
10:30 Devising a prompt with ChatGPT.com
12:06 Modify the flow to trigger from SharePoint
17:28 Testing the flow from SharePoint
18:40 Outro

By the end of this tutorial, you'll have a comprehensive understanding of how to deploy ChatGPT-4o to Azure and integrate with low code automation tools such as Power Automate. Please do let me know in the comments your ideas for automation.

Want to build a canvas app gpt bot? Deploy Mistral Large, Integrate with Power Platform #Mistral #MistralAI #PowerPlatform

#PowerAutomate #PowerPlatform #gpt4o #azureopenai

DamoBird365

Рекомендации по теме

Комментарии

Great work, Damien! Inspired by your example, I upgraded my existing expense app with GPT-4 OCR capabilities. Thank you so much!

lumbinibhw

Hi Damo, thanks for the content. It is highly professional with chapters, description, extra links and the way you explain how to do it is really amazing.

Emre-yxvw

Great work, Damien! Thank you for producing and sharing the awesome video. It inspires me to keep pressing on with Power Automate integration with AI.

mannymorales

Damo, nice work - thank you for producing and sharing this excellent information. Kudos for throwing in a bit of debugging insight as well! Looking to continued content!

mannymorales

Thanks Damo, this is awesome.
This is something I wanted to get started with and I was just looking for an ideal use case

toluvictor

Multimodal LLMs just made the OCR simpler... Great use cases Damien and as always superb demo, great you kept the base64 error as well 🔥🚀😍

Thanks for introducing me to the concept of developing a model in Azure that allows me to harness powerful chat gpt abilities. Can bing ai be used to refine the prompt for the model or must one use chat gpt proper for best results? PS - if my question misses the mark it’s because I have a sloppy understanding of large language model learning etc. PPS - 0.008 seems like 0.01 aka one hundredth of a dollar aka one penny but perhaps I’m getting that wrong too? (No need to spell out any long math remediation) - bottom line, I’m thrilled that you take the time to cost out approximate Azure pay as you go costs. Knowing that I can test something 500 times and only incur as little as 50 cents or as much as $5 is helpful. Thank you.

geralddahl

I have an accounting company in Germany and would like to implement it, what are the costs for the programs, do you know that?

Musti

Hello Damien, thanks for great video. Can I ask does this solution require premium license in terms of the connector you are using? Thank you so much.

peterpetrou

I started exploring upgrading some flows based on my OCR & GPT3.5Turbo template to GPT4o and I'm finding there are some limitations not mentioned in the video.

First, GPT4o requires the upload documents to be images, so png or jpg. That I expected. But the built-in Microsoft convert PDF to jpg action only converts the 1st page of a PDF to an image. So to use GPT4o on multi-page invoices or other documents would require a subscription to a 3rd party service like Encodian or Adobe in order to convert each multi-page PDF into a set of several images that could be fed to GPT4o. This is likely a no-go as my org doesn't want all our documents with sensitive data going through a 3rd party service.

Second, GPT4o takes longer to process images than the OCR & GPT3.5Turbo set-up. So if an application has a user waiting in real-time on the document to be processed before proceeding, then it may be a hinderance to user experience.

Overall, I think I'd only switch over to GPT-4o if going from like 96% to 99% field extract accuracy was necessary or if the use case requires processing something not typed text related like whether the document has a signature/stamp on it or not.
Otherwise I may wait to see if later models enable uploading PDFs directly.

tylerkolota

What 365 license do you need to allow this work flow to use ChatGPT?

stuartlittle

Would it be possible to feed chatgpt inside power automate both jpeg and pdf for it to extract data from?

SulHund

A video request:
Hi Damien,
In our organisation we work in teams. And for new projects we create a channel with an predefined file structure, with a onenote for that specific project and a planner board for that project.... But thats a lot of manuell work... Is there a possibility to automate all these processes?

TrueSpeaker-gxtw

Hi Damo, Thank you for this great video. I implemented this in my environment, but facing issue "Request Header Fields Too Large" for a small image file content. Do you know how to solve this?

rishikeshmishal

Hi, very nice video. An OCR recognition would be possible with Mistral Large LLM too?

bmassimo

You have referred to Chatgpt 4o as optima, and the o stands for omni.

cbau

Amazing. Can you give it as a subscription? I have clients that want to use it. What would be accuracy for Arabic receipts?

theaunsyed

How to use it for pdf data extraction?

hammadyounas

This is a gamechanger! thx for sharing! have you tried to make it work for PDF files? I tried replacing image/png;base64 with application/pdf;base64 but it doens't work, I think the type 'image url' has to be changed in that case?

stevedaregmailcom

Automate OCR with GPT-4o and Power Automate

Automate OCR with GPT-4o and Power Automate

This GPT-4o Automation Changes Everything

GPT-4o for OCR: Build a Food Order Carbon Footprint Calculator

GPT-4 Vision API: Best Way to Copy Text from Image (OCR in Python)

GPT-4o Low Latency Screen to Voice Tutorial - SUPER IMPRESSIVE OCR!

GPT-4o is here! Let’s build 4 things with it! | API

Automate PDF Invoices Data Transfer to Google Sheets with ChatGPT & Zapier | Tutorial

How To Use ChatOCR ChatGPT Plugin?

ChatGPT Advanced Data Analysis Hack: Extract Text From Images (OCR)

Use ChatGPT-4o Vision on Images with Power Automate

AUTOGEN TUTORIAL - build AI agents with GPT-4o and Microsoft's AutoGen

Using ChatGPT with YOUR OWN Data. This is magical. (LangChain OpenAI API)

Can GPT-4 Improve OCR (turning image scans into text?) | Unscripted Coding

How to Build an AI Document Chatbot in 10 Minutes

5 Prompts That 99% of GPT-4o Users Don‘t Know

How to Install Chat GPT for Google Docs - Use GPT AI in Documents

Microsoft AI Builder Tutorial - Extract Data from PDF

Getting Started with Azure OpenAI and GPT Models in 6-ish Minutes

10 Powerful Shortcuts with ChatGPT-4o

Create Efficient Business Processes with AI: Quick and Amazing Results

GPT PDF & Image Data Extraction (Power Automate)

PrivateGPT 2.0 - FULLY LOCAL Chat With Docs (PDF, TXT, HTML, PPTX, DOCX, and more)

Automated PDF Analysis: Using ChatGPT & Zapier For Any Industry | Tutorial

OCR and ChatGPT: Streamline Document Analysis Effortlessly