How to extract text from multi-page PDF & save it as CSV - Amazon Textract tutorial p4

preview_player
Показать описание
Welcome to the part 4 video of the tutorial series on Amazon Textract. In this video, I have covered how to extract text from a multi-page PDF file and save the output as CSV.

---
Support my work:
---
Paytm | Gpay: 9023197426

---
Series Tutorial
---
---
Another channel:
---

---
Connect with me
---
Рекомендации по теме
Комментарии
Автор

Excellent walkthrough of pdf to csv solution. But I can't seem to import panda as indicated. How do I get a little help?

dougchristensen
Автор

Hi there! Thank you so much for this! Helped me out massively :) Sorry if I missed something, but I was wondering, is there a reason you use SNS instead of an S3 event to trigger the "process textract response" lambda? Would it be possible to skip out SNS and have the same effect with just another S3 event trigger?

JackRobinson-jjcn
Автор

This was an amazing video thanks so much! Question for you...how would I get the CSV outputs to match the input PDF file names instead of the Job ID string. So basically, I want input.pdf to return input.csv instead of that long string of numbers.csv

testeleven
Автор

Hi Chirag. Thanks for doing this video. By the way I need to know something, can we save the csv with the same name as the pdf. Hope to hear from you. Thanks again

georgevavolil
Автор

Hello and thank you for the video! I am having trouble when testing the first lambda function, async_job_creation. I do not see anything output in the Cloudwatch logs when I save a PDF to the S3 bucket. I receive the message "log group does not exist." Any suggestions? Assuming the Lambda function is not being triggered?

brittanyross
Автор

Thank you for the amazing video. I have liked and subscribed to your channel. I had a question about the workflow. Right now, the pipeline runs when a single file is uploaded to S3. If I have some kind of UI where I let the user upload multiple files. Then for each file uploaded, there will be two lambda functions and 1 aws textract running in parallel for each file. How can this be made more efficient for multiple file uploads ? Lets say the user uploads 3 files. Is there a way a single lambda can process those 3 files, send to aws textract which writes 3 separate outputs to /textract-output and then another single lamba fn that could process those 3 textract-outputs and write them to 3 separate files into the /csv folder with the appropriate file names? Let me know if that makes sense. And again thanks for you excellent video.

aleenaselegy
Автор

Hello, can i save the output csv files on my local machine instead of bucket??

achrafbhiri
Автор

anyone else here getting the JSONDecodeError: Expecting value: line 1 column 1 ? I keep getting this and I cannot figure out why :(

music-ish
Автор

What modifications would you make to get the API process forms Key-Value pairs? Having trouble trying to understand that

curiousl
Автор

cannot find the example pandas_39.zip file in the repo shown. Is there a way to download that to follow along? Thanks for your video

aleenaselegy
Автор

can you please upload pandas zip file because its missing on github

harbindsingh
Автор

I can't find pandas_39.zip file, can you help me.

thanhba
welcome to shbcf.ru