Python Programming Tutorial - 27 - How to Build a Web Crawler (3/3)

preview_player
Показать описание
Рекомендации по теме
Комментарии
Автор

My learning time pie chart through these 3 tutorials:
90%: finding a web to crawl
10%: code, debug, copy and paste, debug again

KhanhTran-upit
Автор

for those of you confused with this tutorial, i advise watching other web crawler vids too that helps. do stick to this course, but if you can't understand a topic from bucky - learn the same topic from others then come back here you're more likely to understand. that works for me

zeemanmemon
Автор

If this is giving you guys trouble. I suggest pausing the tutorials and practicing, and I don't mean practice for 5-10 minutes and you're good... No, I mean as long as you have to practice to get it down.
I watched all 3 tutorials for this web crawler a couple days ago, however for the last 2 days I've been practicing ONLY lesson 1-2.

I'm finally going to do this third part because I can now enter the entire first part of the program from the top of my head, and understand what it means.

Python is a language, nobody is going to judge you if you need a little bit more time to practice what you're trying to learn.

everettlogan
Автор

(English not my native) i just loved these tutorials, the way Bucky explains and everything, i know its quite old, and buckys page exists no more, but that pushed me to practice with a Website, giving me more challenges, and having to make more research, for obvious reasons, and know i have my own web crawler adjusted to another web Page, with a whole different settings, like bringing not the links but the text inside a child of an specific condition and stuff, and i'm so excited right now, and maybe it's not that much for most of you, but for me that im not near to be a programmer, (i studied Economics), i feel like a hacker

GeraSanz
Автор

Source code if you want to store the information in a text file:


import requests
from bs4 import BeautifulSoup

def trade_spider(max_pages):
    page = 1
    fw = open('Items Available.txt', 'w')
    while page <= max_pages:
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        fw.write("PAGE " + str(page) + "\n\n")
        for link in soup.findAll('a', {'class': 'item-name'}):
            title = link.string
            fw.write(title + '\n')
            fw.write(href + '\n\n')
        page += 1
    fw.close()



trade_spider(3)

hlsavior
Автор

Amazing tutorial! I got an assignment to make a web crawler and create tf-idf on the results. This gave me a headstart! Thank you so much!

yooos
Автор

It's 4:45 in the morning for you Bucky. Go to sleep >:O

camelCaseFTW
Автор

"It's kinda weird that the more work that your program does, the easier it is to code". Bucky Roberts

abidriaz
Автор

by far, and ive searched the ocean bottoms and moutain tops. did i say by far ? ummmm, just wanna say thanks, actually i wanna say a bit more. this 23-27yr. old age bracket has literally taught me how to build a website by: (Tyler) using wordpress step by step, and im learning more in three days with you than....ever. finally Chance tha Rapper initally got my attention by giving all of his music away free. WHAT ? Look at you guys, not freakin asking for money, just sayin "check this out, let me show payin it forward Bucky, ...I print T-shirts, like the best on tha planet....so my plan is finish me website, but with this integration of python functions ..huh? i would love to speak with you some day .. I would love to share something with you as an appreciation....

sydsgraphics
Автор

Wow. THANK YOU! THIS is GREAT! I've learned a lot this afternoon. In your case, you were able to uniquely identify the data you were crawling for via a class tag. In my case, I only want the <a> tags contained in either either <ul><li> tags or within a <table><tr><td> set of tags. I've been trying to follow the beautiful soup docs for navigating with find next sibling, etc. but I'm in over my head a bit. Vould you do a quick example of finding <a> tags within other tags to do the same kind of crawling?

MikePorterII
Автор

Hey Bucky,

Man, your tutorials are awesome. Keep doing more!
The only problem that you sometimes forget that some linkes are outdated.

Thanks!

kareemjeiroudi
Автор

Great tutorial you're a very good speaker and educator. Everything that you said was clear and precise with the example you made it easy for me to learn thank you very much I subscribed and I'm going to continue to watch the rest of your content good work

spartanaerospace
Автор

Python is great for web scraping, I recommend doing a multithreaded BFS for serious data science purposes.

Thegamemakur
Автор

Hey, Becky. Thanks for this great tutorial. But I was wondering, I have to create a generic crawler based on some keywords to crawl some random websites (which uses this keyword or have content based on this keyword) and then perform some logic. I do not want to explicitly mention websites or web pages link that needs to be crawled. How can that be done?

amanmaheshwari
Автор

No idea what happened in the last three videos, but it was cool! Time to get down to business.

TheAkbar
Автор

Question: How do I store the crawled data (CSV maybe)? I need to create an inverted index to run a query on the saved data.

pratiklibra
Автор

Great series. You solved many problems I had.

theody
Автор

Awesome video, Thanks Bucky. was greatly helpful.

roysamson
Автор

Hey Bucky - firstly, fantastic, clear and easy-to-follow tutorials. I've just stumbled across them but thanks so much already!
However, something you mentioned at the end of this one has tripped me up and I've not been able to dig up a solution. I know these videos are fairly old now, but how would you go about, in this example, adding the urls into a set to quickly and simply strip out the duplicates?

jonpoulter
Автор

I dontt see anything at ur buckyrooms trade websites?

sien