Web Scraping Wikipedia tables using Python

preview_player
Показать описание
In this tutorial, we will learn how to create a Python program to web scraping tables from a Wikipedia page. This method will work for pretty much most of the tables you see on Wikipedia.

Before the tutorial, makes sure you have BeautifulSoup and requests libraries installed.
To install BeautifulSoup: pip install beautifulsoup4
To install requests: pip install requests

Buy Me a Coffee? Your support is much appreciated!
Venmo: @Jie-Jenn

More tutorial videos on my website

Considering support my channel through shopping on Amazon

Want to be more productive? Check out the the gear (Amazon affiliate links) I use when I am working.
Рекомендации по теме
Комментарии
Автор

Forgot to mentioned that the output from read_html method is a list. To convert the list object to a DataFrame object, simple extract the first element from the output. For example df = df[0].

jiejenn
Автор

For those with trouble finding table_id:

You can use table class name, instead of the table_id (i.e: <table class="wikitable sortable">)

In that case, I made a change to these 2 lines of code:

table_name = 'wikitable sortable'
soup_table = soup.find('table', {'class':table_name})

Hope this helps

sammcintyre
Автор

Why would you want to scrape a table instead of text? What would a table be used for?

suomynona
Автор

I cant find a table ID on the wiki page

princek
Автор

Thanks for the video, I see you also forgot to mention that df makes use of lxml, thankfully I can read the errors and so installed it.

christopherwells
Автор

Thanks I am so nearly there! One question. I get to 5 mins 48 secs with the same results as Jie. But when I try to print(df), the terminal says: "Traceback (most recent call last):
///File "<stdin>", line 1, in <module>///NameError: name 'df' is not defined".
From my understanding I have defined df in line 12 - so I can't work out why it's not working? I am a newbie so answers for dummies appreciated.

michaeltillcock
Автор

Hello, I am using Chrome but I can't see the table ID, only the class. Do I need to do something else to get the table ID?

callvengeance
Автор

Very good! It worked perfectly! Thank you!

otaviodzb
Автор

Thanks for the vid, man! Do you happen to live in Alabama btw?

tonypendletoniii
Автор

I can not use pandas. why is it happening?

farhangony
Автор

Can you recommend any good extensions for python in VS Code

akshatjain
Автор

What python Client are you using?
looks alot more simplified than pycharm

thomascooney
Автор

How to turn the output of this into a DataFrame?

mohamedhachaichi
welcome to shbcf.ru