Python! Extracting Text from PDFs

preview_player
Показать описание
Tutorial on how to extract text from PDF files. Learn the difference between natively digital and scanned PDFs, extract text from a digital PDF using PyPDF2 and extract text from a scanned PDF using optical character recognition with pytesseract.

CONNECT:

|-Video Chapters-|
0:00 - Intro
0:10 - Installing packages
1:41 - Text extraction definition
2:21 - Extracting text from a natively digital PDF
4:44 - Extracting text from a scanned PDF using OCR
8:35 - References and additional learning
Рекомендации по теме
Комментарии
Автор

I need to modify some words of a pdf file and then save the edited text, including the rest, in a new pdf file. can you help me? Thank you, Francesco

francescovecchio
Автор

does it possible to run on pycharm or in jupiter only?

markjosephortizano
Автор

Good video, easy implementing actually, but is not a good way to scans from pages of books...so many errors in transcription

TheSantiago
join shbcf.ru