Power Query - Extract PDF Tables by the Table's Content

preview_player
Показать описание
Learn how to extract tables from PDF files based on the content of the tables. This technique is NOT reliant on table names or page locations. PLUS, many cool tricks for dealing with data discovery and manipulation exist.

File Download Link:

00:03 Overview of Problem
01:10 File Download Instructions
01:20 Main Issues (Project Overview)
02:08 Connecting to a Folder (Controlling Scop
03:13 Filter for PDF Files
03:43 Remembering Where Records Came From
04:54 Extracting PDF Metadata
05:38 Extracting Tables from PDF Files
06:11 Discovering Needed Tables
06:50 Converting Tables to Rows
07:12 Combining Nested Lists into a Single List
07:31 Searching for the Keyword that Identifies Needed Tables
08:30 Expanding Table Contents
08:43 Promoting the Header Row
08:49 Removing Unwanted Columns
08:59 Unpivoting the Stacked Tables
09:13 Creating Proper Dates
09:36 Removing Unwanted Rows (Errors)
09:59 Loading the Results to Excel
10:07 Building a Report
10:44 Testing for New Data
11:15 Issues with Hardcoded Titles
12:55 Renaming Columns by Position
14:02 Updating the Remaining Code
14:34 Testing the Dynamic Column Name Feature
14:54 Project Conclusion
Рекомендации по теме
Комментарии
Автор

This channel is a hidden 💎, and this particular video is Epic!
I learned more than I bargained for.
Thank you very much.

FsoOmar
Автор

Just want to say that this help me out so much. Thank you for sharing the info. What I needed wasn't as complicated as this, but it did exactly what i needed in my use case in a much clearer way than every other tutorial i looked at.

joseapar
Автор

Brilliant works! very much real life related example and to the point! thanks a ton!

sharifashikurrahman
Автор

Just when I thought I had it all figured out, you showed me an easier way. Thank you for the Power Query videos; they have saved me a significant amount of time.

ronaldagee
Автор

As always, clarity and complete control of the situation. Extremely pro. I love these types of examples and your way of telling them, always to the point. Thanks, Bryon!

IvanCortinas_ES
Автор

Sir, Every video of yours is amazing, but this one is on another level! Fantastic and excellent work!

IrfanChanna
Автор

Amazin, thank you very much, this is the only detailed video about extracting data from pdfs the right way

kennethvela
Автор

Lots of interesting and useful elements. Thanks

tonybatty
Автор

As always, a tutorial at the top of the bill.
Magnificent work, dedication, and explanation. 💯 👍

robbe
Автор

Great job! Clear explanation with attention to details that matter. Professor!

kkravch
Автор

Great video, thanks for explaining this so well!

chrism
Автор

Great study case. I just had the same issue yesterday when i was building a data model that had the same error when trying to refresh my changes in PQ . You pointed me in the right direction. Thanks for posting

jazzista
Автор

The way you teach is SUPER as always, thank you. I have a question in mind, how do you deal with tables that span across pages?

anpham
Автор

i have a problem in an append query, I append folder contain many sheets inside the sheets there are different format like mm-dd-yy
or dd-mm-yy
what do you suggest ?

ahmedshalaby
Автор

That worked great, I was able to modify it for my scenario with how well you explained it. Instead of identifying and retrieving information from one table is it possible to get information from two different tables? When I tried adding multiple tables it got messy around the unpivot stage and doesnt seem to work in that scenario

MikeM-ho
Автор

What if its not a table but just text and numbers. What if you are extracting something that's always on the first page (name and address) And a grand total on the last page. But the last page varies from file to file? One doc the total is on the first page. On another is on page 3, then 5 and so on. Can you do an exercise like that?

celestebenitez
Автор

This is great!, thanx for sharing. I have one issue. i am trying to extract specific data form an invoice, and it looks like the first table then query retrieve is the entirety of invoice - I guess on the ground of the fact that the specific word is somewhere in there, and the second tab is indeed giving the table that i need. i can eliminate the first occurence of each invoice i am retrieving, but how can i make sure i am eliminating the right table? I do share the comment that your channel is an hidden gem!

jeromeastier
welcome to shbcf.ru