Web Scraping using Python | Exploring Urllib, Requests & BeautifulSoup | Automation

preview_player
Показать описание
Web scraping or web data extraction is the process of scraping(retrieving) data from the web. Unlike the conventional process of manually extracting data, web scraping uses intelligent automation techniques to retrieve hundreds, millions, or even billions of data points from the internet. Major uses cases of web scrapping are lead generation, market analysis, data gathering, sentiment analysis, and much more.

In this video, we will see how we can use different Python libraries such as Urllib, Requests and BeautifulSoup for web scraping and explore some of the important functions of the same.

1. Extracting data from the web -
Urllib and Requests modules offer functions to scrape data. The basic requirement is to provide a URL you need to fetch data from and leveraging the methods provided by these modules you can extract the content found over the same URL.

2. Parsing the extracted data -
BeautifulSoup helps us to parse the data extracted from the web. One of the major formats the contents comes in is HTML. This module provides us functionalities to read through the HTML and extract the required information from the same. It also provides parsers for working with XML data.

We have tried to cover important topics and code implementations in the video ranging from "how to extract data" to brief information on "how to get started with web automation"

1. Scraping data -
We have explained how one can use Urllib as well as Requests to read the data from a website and use it for further processing.

2. Parsing data -
After the extraction of data, one can use BeautifulSoup to parse raw HTML data using HTML parsers.

3. Fetching data -
Using BeautifulSoup functions one can perform various functionalities like extracting title, finding anchor tags, fetching the value of a particular element, exploring the DOM tree, etc.

4. Storing scraped data -
We showcased how one can extract data from a text file over the web and store it on a local system as well as perform analysis on the same.

5. Web automation -
We tried to give you a basic understanding of web automation and how tools like Selenium can help for the same. Selenium is a powerful tool for controlling web browsers through a program. It is functional for all browsers as well as works on all major OS. It is majorly used for automating web testing which when done manually can take hours and can not be as efficient as when done computationally. On the fun part, it can also help you to automate your day to day tasks like controlling your tweets, WhatsApp texting and even google searches in few lines of python code. The limits of automation are endless.

~~~~~~~~~~

~~~~~~~~~~

Connect with us on our social media channels to get daily updates on Data Science and Artificial Intelligence.

~~~~~~~~~~
Рекомендации по теме