Python: Renaming PDFs using text inside a document with regex

preview_player
Показать описание
In this tutorial, we expand on renaming PDFs using regular expressions (regex). This is one of the many examples of using regex, if you have different requirements it will require a different regex expression.

I have posted a written explanation of the regular expression used in the video on GitHub.

If you have any questions leave them down below and I'll try and respond (hopefully more quickly this time).

Chapters:
00:00 Intro
00:18 Requests & start
01:28 Reviewing code
07:34 Reviewing the regex expression
11:23 Alt. regex w/o formatting
12:46 Example run with regex
14:11 Using a list of names to rename
Рекомендации по теме
Комментарии
Автор

IMPORTANT:
If you do NOT CARE about what comes after the keyword for the positive lookbehind expression, use the following instead:
(?<=Order #: ).+

stephencodes
Автор

Thank you, Steve. You saved me from manually renaming nearly 400 PDFs. And many more in the future. I'm a trial attorney who handles big medical files that are often unorganized. My RegEx is \d{1, 2}(\/|-)\d{1, 2}(\/|-)(\d{4}|\d{2}) then I rearrange, pad the pieces, and add a random 3-digit string to make the filename unique to sort and group date-related records.

parkourninja
Автор

Omg the first time my comment is in a video! Thank you so much for this amazing tutorial! When are you going to set up a Patreon?! Or I can pay you back in calculus videos or any higher level math tutoring!

TacosYBurritosP
Автор

Bro is just insane, thank you so much for this video man

Lioneriod
Автор

Man
Thank you so much, worked like a charm

giamonioz
Автор

Hey Stephen - i have like thousands of pdf’s in a folder with a difference that like we take your case some pdf’s have Order # basis which we want to rename however some pdf in same folder has Product # instead of order #. So how to rename within the same code? Do or works in Regex?

aayushaggarwal
Автор

Hi Stephen, this works like a dream.
But when I try to change the cr_regex line to suit my case it does not work.
The text in my file is B/L番号(1) JBX1A12345. I only want the JBX1A12345 so I tried to change to cr_regex = r'(?<=B/L番号(1) )[A-Z]{4}\d+', it shows AttributeError: 'NoneType' object has no attribute 'group'.

noctischen
Автор

This is awesome man. Nice work. Would it be difficult to edit the code to exclude special characters? It worked perfectly other than instances where I had a "/" in the lookup text.

johnnyb
Автор

how we do it ? if want to take diif text from pdf like case num, doc number, name and save with this file name

for example:

using the naming format "C:\...\Case Name\DocumentNumber FilingDate LastName FilingType.pdf."

"C:\...\Leal v. Bedel et al\#026 2022-07-02 Staedter Motion for Extension of Time to File Answer.pdf."

greenlight
join shbcf.ru