filmov
tv
Extracting data from PDF files using Python

Показать описание
【Online Courses】
I introduce the PyPDF2 package, which we need to install.
Installation on Anaconda:
conda install -c conda-forge pypdf2
Installation using the pip installer:
pip install PyPDF2
I show you how to create and activate a virtual environment (which is optional – but useful to do). Then we develop the code step-by-step. This will enable you to learn how to modify the code to suit your specific requirements. Please leave a comment if you have any questions.
Finally, we will refactor the code. We define a function that takes a search term and filename and returns a tuple containing the total number of occurrences and the number of pages that contain the search term at least once.
*Chapters*
0:00 Welcome
0:15 Return all occurrences & page numbers
0:44 Example PDF
2:23 Python setup
3:55 Virtual environment
6:16 Coding fun
28:05 Refactoring
*The channel*
YUNIKARN focuses on publishing educational content in applied statistics, mathematics, and data science. In these fields, programming skills have become essential. Hence, we cover various programming languages including Python, Stata, and C++ to tackle problems and for fun.
*Stay in touch*
*Hashtags*
#datascience #python #PDF
I introduce the PyPDF2 package, which we need to install.
Installation on Anaconda:
conda install -c conda-forge pypdf2
Installation using the pip installer:
pip install PyPDF2
I show you how to create and activate a virtual environment (which is optional – but useful to do). Then we develop the code step-by-step. This will enable you to learn how to modify the code to suit your specific requirements. Please leave a comment if you have any questions.
Finally, we will refactor the code. We define a function that takes a search term and filename and returns a tuple containing the total number of occurrences and the number of pages that contain the search term at least once.
*Chapters*
0:00 Welcome
0:15 Return all occurrences & page numbers
0:44 Example PDF
2:23 Python setup
3:55 Virtual environment
6:16 Coding fun
28:05 Refactoring
*The channel*
YUNIKARN focuses on publishing educational content in applied statistics, mathematics, and data science. In these fields, programming skills have become essential. Hence, we cover various programming languages including Python, Stata, and C++ to tackle problems and for fun.
*Stay in touch*
*Hashtags*
#datascience #python #PDF
Комментарии