Introduction to Web Scraping (Python) - Lesson 03 (Scrape Multiple Web Pages)

preview_player
Показать описание
In this video, I show you how to scrape multiple pages that follow a certain logic.

Kieng Iv/SAF Business Analytics
Рекомендации по теме
Комментарии
Автор

What a coincidence! 5 min ago I was watching Kieng's Pokemon Go video to relax and when I go back to study it was him again!

theolderwaitor
Автор

Your videos are changing my life. I've used simple search and replace to convert spidered data to csv. You just taught me to do it with one script in minutes. Thank you. Python is new to me and it's a "bitchy" language on getting new errors, but I'm getting better. PyCharm is a GODSEND for me as a beginner. I'm still getting used to the interface.

Can you tell me what you would do in a case I have. I need to update our database with good emails because the contacts we have handle tradeshows for companies. These people change so often in their careers due to burnout we lose a significant amount of good emails every year. About 13K records.

We send these companies RFPs all the time, but get no responses due to the turnover. We need other contacts at these companies. Any contact would work to get sales leads to the proper person.

I stripped the domains out and was wondering if I could design a script to go to google and grab the emails in the top 20 records for all emails @domain. Thank You!!

hierge
Автор

Hi! Excellent examples.
I would like to know if it is possible scraping a url list using bautiful soup.

MrHsuarezb
Автор

OMG! I have to scrape the same page for my project!!! :-D Thank you very much for your explanation!

wordsgo
Автор

This was super helpful! I’m wondering if HasData can help scrape sites that require login information?

wulkynebabe
Автор

How do I manipulate the URL with an array that I have built? So that in each piece of the array becomes the input to the URL in the same way that you have manipulated the URL to cycle through the alphabet?

taylorrhodes
Автор

what web browser are you using? any video on efficiently using a web browser to find the relevant html tags?

piyush-kumar-tank
Автор

sorry i dont understand how you get page after page...also do you have the source code maybe then i got try it and see to get it right understanding..thxz for the vid Lisa

pjmclenon
Автор

Thank you for this excellent tutorial video! One question though: how do you determine when you've reached the end of the NEXT pagination????

antoniodesousa
Автор

How to include different proxies to extract multiple web pages that Denies Access to scrap while trying the above explained way. using urllib to change proxies didn't work out me. Suggest me a good way to scrape those web pages

MrSedath
Автор

Thank you very much for your very succint and straightforward explanations! For a novice Python user like me, all four videos are extremely helpful. Now, I got 2 questions followed by this last video:

1. How can I make 'def' to automatically scrape reviews for entire web pages for certain hotel? I understand making 'soup' for one url page by page is possible but I am sure there is way to automate this process. Could you explain further?

2. For TripAdvisor, the review has 'more' and 'less' button. To extract full review, how can I handle this tagging issue? Current tag for review says, 'p'.

I am not sure if I fully asked you about the difficulty I am facing now.
In advance, thank you very much for your time and help!

KyungBaePark
Автор

Thank you very much for the series. They are very informative.
I know these were made a while ago, but can you consider a scraping tutorial that includes Javascript? It seems like tripadvisor now uses Javascript and I thought that might be a good complement to the rest of your series.
Regardless, I learnt a lot from them. Thanks again.

JaquesStrydom
Автор

what if i want to change that "letter" with numbers?

yogajangkungs
Автор

how would you write the 'for' loop if you are changing the numbers in the url? do you still do: for in ascii_lowercase: or would you do something like: for in range

adurk
Автор

Hey man, Thanks for this awesome video.
I was trying to apply this on a website "Bulbapedia.com". The problem is that, I cant get the 'href' for the next pages.(maybe because they have used 'href id' instead of 'href'). Can you please tell me how can I get through this.

pawanblaze
Автор

how can i run automatically across multiple open tabs without having to enter URL

temple_
Автор

your videos are awesome! Do you plan on adding more? I am curious how to feed python from a database to complete the url - here is an example of which I am using an array but I need to feed Python about 5k.
Thanks again!

ppluck
Автор

Hey Kieng, your web scraping tutorials have really been comming in handy for me and I really appreciate the effort you put into your content. I have a bit a problem however with this script. When attempting to write the csv file, my console returned this error. I copied your code letter for letter and am not getting the same results.

Traceback (most recent call last):
File "/Users/Jocisabreu/PycharmProjects/Scrape/Tut_3.py", line 15, in <module>
File "/Users/Jocisabreu/PycharmProjects/Scrape/Tut_3.py", line 9, in make_soup
page = urllib.request.urlopen(req)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 564, in error
result = self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 756, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

joejimmm
Автор

What is this book mark? 😂😂😂
IT Adults 😄😄😄

mujtabashaikh