Merge multiple PDF files based on their name using Python (Real-World Example)

preview_player
ะŸะพะบะฐะทะฐั‚ัŒ ะพะฟะธัะฐะฝะธะต

๐——๐—˜๐—ฆ๐—–๐—ฅ๐—œ๐—ฃ๐—ง๐—œ๐—ข๐—ก
โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€
In this tutorial, you will learn how to combine pdfs by name using Python. In particular, we will be using the PyPDF2 module. This is a fast and easy way to merge pdf files without copying and pasting.

๐ŸŒ ๐—Ÿ๐—œ๐—ก๐—ž๐—ฆ:

โญ ๐—ง๐—œ๐— ๐—˜๐—ฆ๐—ง๐—”๐— ๐—ฃ๐—ฆ:
00:00 โ€“ Introduction
00:22 โ€“ Explanation of the task
00:50 โ€“ Coding out the solution
05:43 โ€“ Outro

๐—ง๐—ข๐—ข๐—Ÿ๐—ฆ ๐—”๐—ก๐—— ๐—ฅ๐—˜๐—ฆ๐—ข๐—จ๐—ฅ๐—–๐—˜๐—ฆ
โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€

๐—–๐—ข๐—ก๐—ก๐—˜๐—–๐—ง ๐—ช๐—œ๐—ง๐—› ๐— ๐—˜
โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€โ–€

โ˜• ๐—•๐˜‚๐˜† ๐—บ๐—ฒ ๐—ฎ ๐—ฐ๐—ผ๐—ณ๐—ณ๐—ฒ๐—ฒ?
If you want to support this channel, you can buy me a coffee here:
ะ ะตะบะพะผะตะฝะดะฐั†ะธะธ ะฟะพ ั‚ะตะผะต
ะšะพะผะผะตะฝั‚ะฐั€ะธะธ
ะะฒั‚ะพั€

*The task was somewhat specific, but I hope you learned something new! :)*

CodingIsFun
ะะฒั‚ะพั€

As always, understandable clean code and perfect solution! Thank you Sven, for your videos and professional attitude .

brazilleros
ะะฒั‚ะพั€

Wow, this video represents a very practical scenario in the field of science industrials operation data analytics.
My another suggestion would be the same scenario for excel file type append method for a key on โ€˜financial year monthโ€™ basis which the Key then also needs to be converted to a DATE format for proper analytics, graphs and exact time series order.
Btw, great video Sven!!! ๐Ÿ‘๐Ÿ‘๐Ÿ‘

asankacool
ะะฒั‚ะพั€

Love the videos! Very helpful. Thank you!

torque
ะะฒั‚ะพั€

Awesome. I am waiting for your videos day after day.

KhalilYasser
ะะฒั‚ะพั€

woww this video is a wonderful video and pushed me to some other videos in your channel. great content. thanks for uploads., ,,,

YasinNabi
ะะฒั‚ะพั€

Been looking into the best way to create a simple gui that shows a list of pdfs in a folder, has an area for creating an output pdf to combine files into (list of multiple output files as these are "pdf packages" that are being built), has a button to copy a selected pdf into the desired output file (this I imagine would just be the file path of the selected pdf to append or merge with the desired output pdf)

I'm considering doing everything in excel but I'm now considering React/JS or maybe Python.

What would you suggest?

swagz
ะะฒั‚ะพั€

Hi. I have tried pymupdf and pypdf2 to merge forms with fill-able fields in them. Either fields are missing from resulting pages or all the fill-able field values are the same. What is going on?

tobiewaldeck
ะะฒั‚ะพั€

Hi,
when I run the program it again starts to scan the already merged files, I want it to only scan the newly added files in the folder and to perform merge operation to those only, could you help me with this, thank you

Aditya-mxgv
ะะฒั‚ะพั€

Thank you for your Video and yes i already learned something even if it didn't work for me.
I got a the an Error: "TypeError: unhashable type: 'list'" and don't know how to handle that for now.
Do you have a tip for me?

manuelbibbes
ะะฒั‚ะพั€

very good video, is there a way to choose which order the pdf needs to be merged?

diegodanciguer
ะะฒั‚ะพั€

Hi just one question. Iv got error on file.name part line 18 are there any solutions???

dain
ะะฒั‚ะพั€

Hi, Sven! Thank you for such a helpful video! One question though, I have multiple files just like you've shown in the video. My files looks something like '001.pdf', '002.pdf', '001 - Content.pdf', '002 - Another Content.pdf' mixed in a single folder just like in the video as well. However, when I run the code, the merged file content order are '001 - Content.pdf" on the first page and '001.pdf' on the second page. My question is how can I swap the order of the content so that the merged content will be '001.pdf' on the first page and '001 - Content.pdf' on the second page? Cheers

dimasramawib
ะะฒั‚ะพั€

Please, share the lesson how to make the book mark for the combined PDF file. Thank you very much!

dule
ะะฒั‚ะพั€

Hi Thank you for your wonderful video, can include adding the Header and Footer along with Merge Please.

raajashekaran
ะะฒั‚ะพั€

Please make a video, or just explain or give clue, How to covert all pdfs in folder to excel, or extract table and save Excel for each file.

gaganrastogi
ะะฒั‚ะพั€

if someone wants to authenticate data with then this code might be help them.

for key in keys:
merger = PdfMerger()
base_file_name = None
for file in pdf_files:
str_pdf_file = str(file)
split_str_pdf_files = str_pdf_file.split(" ")
if
merger.append(PdfReader(str(file), "rb"))
if len(file.name) >= BASE_FILE_NAME_LENGTH:
base_file_name = file.name
if base_file_name:
print(base_file_name)
/ base_file_name))
merger.close()

everythinginpython
ะะฒั‚ะพั€

IMPORTANT
Change all occurences of "PdfFileMerger" to "PdfMerger" and "PdfFileReader" to "PdfReader"
then the code will work.

PdfFileMerger and PdfFileReader are no longer available(removed in PyPDF2 3.0.0.).

jastorgallywix
ะะฒั‚ะพั€

sorry, but your algo is O(n^2). simple change building keys on: keys={}; set(keys.setdefault(file.name[:3], []).append(file.name)for file in pdf_files)
now U don't need rescan all pdf_files for each key just:
for key in keys:
merger=PdfFileMerger()
for file in keys[key]:

qulinxao
ะะฒั‚ะพั€

hi, could you help me with below error,
how can i define path, thank you
NameError Traceback (most recent call last)
Cell In [12], line 1
----> 1 pdf_dir = Path(__file__).parent / "pdf_files"

NameError: name 'Path' is not defined

alejandramunoz