How to extract key-value & table info from PDF & save it as CSV - Amazon Textract tutorial p5

Показать описание

Welcome to the part 5 video of the tutorial series on Amazon Textract. In this video, I have covered how to extract text, key-value pairs, and table information from a multi-page PDF file and save the output as CSV.

---
Support my work:
---
Paytm | Gpay: 9023197426

---
Series Tutorial
---

---
Another channel:
---

---
Connect with me
---

Рекомендации по теме

Комментарии

Excellent Chirag, you are saving ton of my time..very detailed..Thanks much

SK-gnrs

Thank you so much for these! You're a lifesaver. If you end up creating any more, it'd be really helpful to get a primer on adding in queries functionality for this multi-page pdf parser.

AJvanuw

Hi, the “T extract_async_kv_table.yaml” file you uploaded in AWS cloud Formation is different from the one in Git repository. Could you please help me with this ? I need the main file. Can you assist me?

digambarsonavane

Hi Srce Cde, i followed the exact steps that you mentioned in the video. but i'm getting below error in the Lambda function cloudwatch. Can you please help me out. Thanks a lot in advance.
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'lambda_function'
Traceback (most recent call last):

SurajDubey

Can you show about if the code for the video instruction has been updated?

amalsalilan

Hi Chirag, This video is really helpful. I am trying to save output files under the Folders like KV, Table, Signatures and Text it gives JobID with CSV extension format. Instead of that it want to give Input name with CSV extension. For Example, I will give Sample.PDF it want to give the Output like Sample.CSV in all the Output Folders KV, Tables, Signatures and Text folders. Could You Please assist Here.

akshayavarshini

Good day,

Thanks a lot for your videos, i have learnt a lot going through each of them, All of the implementations I tried are working besides this one, For some reason it times out on the process_response method, it gets stuck after displaying the message logging.info("Fetching response"), I even set the timeout value to 15 Minutes and tried with different files

BonginkosiBrian

Hello.
I have used your code on a multi page PDF and it's extracting only the first page

saikrishnachalavadi

How can we add ['SIGNATURE'] to the FeatureTypes and put it in the table, it there is anyway to detect a signature if it exist or not as like key-value, and if there is no signature just to return empty string or smth " ". Thank you !!

SkalarBG

Hi Chirag, Getting this error "[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': attempted relative import with no known parent package Traceback (most recent call last):" and looks like it is due to this line "from helper.helper import process_response, process_error", how to fix this?

SK-gnrs

Hi Chirag, All tutorial are really helpful. I am trying to save all the output into a folder by the filename of input file and underneath all the different subfolder like Text, Tables, kv, Textract - which I am processing, For example, I import 123.pdf so, in s3 I am trying to create 123 folder and underneath all subfolder(Text, Tables, kv, Textract) . thank you in advance :)

jalpazaveri

Wow thank you very much!! I set everything up like the video had but am still getting an error on the JobLambda saying "[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'lambda_function' Traceback (most recent call last).

I did some googling and on stack overflow and some solutions were to validate that the handler is the same as the Lambda Function.py name (which it is), create a separate init.py blank file (like in the Process function), changing permissions on the files before zipping. I had tried all of them and not sure what the error is a result of.

curiousl

Hello, thank you very much for this tutorial. Can you help me understand why I can extract CSV from small-size PDFs (10-30 pages) but not from large PDF files (more than 100 pages) ? Also could you pleasr post how we should modify the parser to generate a single CSV with all the tables? Thank you in advance!

naillazrak

This is really helpful! Thanks so much. Have a question though

In the case, I uploaded 10 separate forms in pdf format at the same time I just need to iterate over the bucket's objects, right? Sorry if it is a dumb question. new to the whole thing . Thanks again.

turkishboy

thank you so much… great tutorial bro, i have issue when upload multiple page pdf, but it doesn't always happen in every pdf file, and when it happens I get this error message <listcomp>\n v = \" \".join([self.word_map[i] for i in relation[\"Ids\"]])\n",

skripsi

i want to ask something for textract can i get ur contact

purnishsinha

How to extract key-value & table info from PDF & save it as CSV - Amazon Textract tutorial p5

Object keys, values, and entries methods

How to Extract Key from Python Dictionary using Value

How to Extract Key Value Pair using PDF.co Document Parser API

#python program to extract single key-value pair of dictionary in variables

How to Extract Key-Value Pairs from XML Files with Python

How Do You Extract A Key Value In Python?

ABBYY FlexiCapture Tool | How to extract data using Key-Value pair

How to Extract single key value pair of a dictionary in variables in Python

HSN | PRAI Beauty - All On Free Shipping 05.07.2025 - 06 PM

How to extract key-value & tables from image document | Lambda | S3 - Amazon Textract tutorial p...

How to Properly Parse and Extract Key-Value Pairs from Device Output in Python

How to Extract Values from Key-Value Pairs in Databricks Notebook

How to Extract Key/Value Pairs from Nested Dictionaries in Python

Extracting All Key-Value Pairs from a Nested JSON Array in Java Simplified Steps

How to Extract Key Values from Complex JSON in Python using PySpark

How to Extract Key-Value Pairs from an Array of Objects in JavaScript

How to Extract a Key Value from JSON Data with jQuery

How to Extract Key and Value from a Nested Object in JavaScript

How to Easily Extract Key Values from a JSON File in Python

How to Extract Duplicate Key:Value Pairs from JSON in Python

How to Extract Specific Key Values from Nested JSON in Python

Extract a Specific Key Value from Nested JSON at a Certain Level

How to extract key-value & table info from PDF & save it as CSV - Amazon Textract tutorial p...

How to Use JavaScript and Regex to Extract Key-Value Pairs from CSS Rules