Web Scraping in Python: Tools, Techniques, and Legality | Real Python Podcast #12

preview_player
Показать описание
Do you want to get started with web scraping using Python? Are you concerned about the potential legal implications? What are the tools required and what are some of the best practices? This week on the show we have Kimberly Fessel to discuss her excellent tutorial created for PyCon 2020 online titled “It’s Officially Legal so Let’s Scrape the Web.”

We discuss getting started with web scraping, and cover tools and techniques. Kimberly gives advice on finding elements inside of the html, and techniques for cleaning your data. She also notes a recent change to the legal landscape regarding scraping the web.

Kimberly is a Senior Data Scientist at Metis Data Science Bootcamp in New York City. She holds a Ph.D. in applied mathematics. We talk about her switch from academia to data science, and discuss her passion for data storytelling and visualizations.

Topics:

00:00:00 – Introduction
00:01:31 – Kimberly’s background and Metis Data Science Bootcamp
00:02:19 – NLP and work in advertising
00:03:27 – Changes in the legality of web scraping
00:06:12 – What are good projects for web scraping?
00:06:56 – Tools to start web scraping
00:07:51 – How to find the elements you want?
00:09:00 – How much HTML should you know?
00:10:49 – Inspecting elements in the browser
00:14:30 – What are good sites to practice on?
00:16:20 – Pausing between requests
00:19:02 – Saving as you go
00:20:54 – Real Python Video Course Spotlight
00:21:55 – Navigating the DOM
00:23:10 – Data cleaning and formatting
00:28:26 – Dynamic sites and Selenium
00:32:16 – Scrapy
00:33:55 – PyOhio 2020
00:35:40 – Transition out of academia
00:38:40 – What are you excited about in the world of Python?
00:41:05 – What do you want to learn next in Python?
00:48:00 – What is a less known Python tip or trick?
00:49:17 – Thanks and Goodbye

Рекомендации по теме
Комментарии
Автор

Suppose we have
l=[1, 2, 3, 4] and
l1=[1, 2, 3, 4, 5, 6, 7]
If we use zip function on this then we get ((1, 1)(2, 2), (3, 3), (4, 4))
But I want to print the pair for 5 and 6 also with 0
So tell me any inbuilt function to print this unparallel iteration

nqntrqb
Автор

Great thank you so much
, Really good channel and where exactly is the price on your way

bobhrobor
Автор

How to print unparallel iteration with zip function

nqntrqb
Автор

If there was real conversation then u would have easily 100k viewership

syedsanaullah