Power Query - Extract PDF Tables by the Table's Content

Показать описание

Learn how to extract tables from PDF files based on the content of the tables. This technique is NOT reliant on table names or page locations. PLUS, many cool tricks for dealing with data discovery and manipulation exist.

File Download Link:

00:03 Overview of Problem
01:10 File Download Instructions
01:20 Main Issues (Project Overview)
02:08 Connecting to a Folder (Controlling Scop
03:13 Filter for PDF Files
03:43 Remembering Where Records Came From
04:54 Extracting PDF Metadata
05:38 Extracting Tables from PDF Files
06:11 Discovering Needed Tables
06:50 Converting Tables to Rows
07:12 Combining Nested Lists into a Single List
07:31 Searching for the Keyword that Identifies Needed Tables
08:30 Expanding Table Contents
08:43 Promoting the Header Row
08:49 Removing Unwanted Columns
08:59 Unpivoting the Stacked Tables
09:13 Creating Proper Dates
09:36 Removing Unwanted Rows (Errors)
09:59 Loading the Results to Excel
10:07 Building a Report
10:44 Testing for New Data
11:15 Issues with Hardcoded Titles
12:55 Renaming Columns by Position
14:02 Updating the Remaining Code
14:34 Testing the Dynamic Column Name Feature
14:54 Project Conclusion

Рекомендации по теме

Комментарии

This channel is a hidden 💎, and this particular video is Epic!
I learned more than I bargained for.
Thank you very much.

FsoOmar

Just want to say that this help me out so much. Thank you for sharing the info. What I needed wasn't as complicated as this, but it did exactly what i needed in my use case in a much clearer way than every other tutorial i looked at.

joseapar

Brilliant works! very much real life related example and to the point! thanks a ton!

sharifashikurrahman

Just when I thought I had it all figured out, you showed me an easier way. Thank you for the Power Query videos; they have saved me a significant amount of time.

ronaldagee

As always, clarity and complete control of the situation. Extremely pro. I love these types of examples and your way of telling them, always to the point. Thanks, Bryon!

IvanCortinas_ES

Sir, Every video of yours is amazing, but this one is on another level! Fantastic and excellent work!

IrfanChanna

Amazin, thank you very much, this is the only detailed video about extracting data from pdfs the right way

kennethvela

Lots of interesting and useful elements. Thanks

tonybatty

As always, a tutorial at the top of the bill.
Magnificent work, dedication, and explanation. 💯 👍

robbe

Great job! Clear explanation with attention to details that matter. Professor!

kkravch

Great video, thanks for explaining this so well!

chrism

Great study case. I just had the same issue yesterday when i was building a data model that had the same error when trying to refresh my changes in PQ . You pointed me in the right direction. Thanks for posting

jazzista

The way you teach is SUPER as always, thank you. I have a question in mind, how do you deal with tables that span across pages?

anpham

i have a problem in an append query, I append folder contain many sheets inside the sheets there are different format like mm-dd-yy
or dd-mm-yy
what do you suggest ?

ahmedshalaby

That worked great, I was able to modify it for my scenario with how well you explained it. Instead of identifying and retrieving information from one table is it possible to get information from two different tables? When I tried adding multiple tables it got messy around the unpivot stage and doesnt seem to work in that scenario

MikeM-ho

What if its not a table but just text and numbers. What if you are extracting something that's always on the first page (name and address) And a grand total on the last page. But the last page varies from file to file? One doc the total is on the first page. On another is on page 3, then 5 and so on. Can you do an exercise like that?

celestebenitez

This is great!, thanx for sharing. I have one issue. i am trying to extract specific data form an invoice, and it looks like the first table then query retrieve is the entirety of invoice - I guess on the ground of the fact that the specific word is somewhere in there, and the second tab is indeed giving the table that i need. i can eliminate the first occurence of each invoice i am retrieving, but how can i make sure i am eliminating the right table? I do share the comment that your channel is an hidden gem!

jeromeastier

Power Query - Extract PDF Tables by the Table's Content

How to Import PDF Files into Excel with Power Query

Bulk Combine PDF files to Excel without losing formatting & NO 3rd party software

Power Query - Extract PDF Tables by the Table's Content

Extract Data from PDF Files with Power Query in Power BI

Import Specific data from Multiple PDF files using power query #shorts #excel #informative

Load All Data from a PDF into Power Query at Once

How can I extract data from a PDF using Excel's PowerQuery?

How to 'automatically' extract data from a messy PDF table to Excel

Extract Data From PDF | Power Query #shorts

Extract Data from PDF 2 Excel | Power Query | Power BI

Combine Files from a Folder with Power Query the RIGHT WAY!

Import Data from PDF to Excel Tables Using Power Query (Quick and Easy)

Properly Convert PDF to Excel

How To Get Data From Pdf, Using Power Query #Shorts

Combine Data from Multiple PDF Files into a Single Excel File

Extract Data from PDF Files with Power Query in Power BI & Also paste data in Excel

009. How to Extract data from a PDF FILE with Excel POWER QUERY - No 3rd party software

Import PDF to Excel with Power Query

Import Data from a PDF to Excel

How to Clean Bank Statements Extracted from PDF Power Query

Get Multiple Files Containing Multiple Sheets with Power Query

Simplifying PDF Credit Card Data Extraction with Microsoft Power Query and Excel

PDF to Excel Converter

How to extract Invoice Data from 1000 pdf files into Power BI