Python Programming Tutorial - 27 - How to Build a Web Crawler (3/3)

Показать описание

Рекомендации по теме

Комментарии

My learning time pie chart through these 3 tutorials:
90%: finding a web to crawl
10%: code, debug, copy and paste, debug again

KhanhTran-upit

for those of you confused with this tutorial, i advise watching other web crawler vids too that helps. do stick to this course, but if you can't understand a topic from bucky - learn the same topic from others then come back here you're more likely to understand. that works for me

zeemanmemon

If this is giving you guys trouble. I suggest pausing the tutorials and practicing, and I don't mean practice for 5-10 minutes and you're good... No, I mean as long as you have to practice to get it down.
I watched all 3 tutorials for this web crawler a couple days ago, however for the last 2 days I've been practicing ONLY lesson 1-2.

I'm finally going to do this third part because I can now enter the entire first part of the program from the top of my head, and understand what it means.

Python is a language, nobody is going to judge you if you need a little bit more time to practice what you're trying to learn.

everettlogan

(English not my native) i just loved these tutorials, the way Bucky explains and everything, i know its quite old, and buckys page exists no more, but that pushed me to practice with a Website, giving me more challenges, and having to make more research, for obvious reasons, and know i have my own web crawler adjusted to another web Page, with a whole different settings, like bringing not the links but the text inside a child of an specific condition and stuff, and i'm so excited right now, and maybe it's not that much for most of you, but for me that im not near to be a programmer, (i studied Economics), i feel like a hacker

GeraSanz

Source code if you want to store the information in a text file:

import requests
from bs4 import BeautifulSoup

def trade_spider(max_pages):
    page = 1
    fw = open('Items Available.txt', 'w')
    while page <= max_pages:
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        fw.write("PAGE " + str(page) + "\n\n")
        for link in soup.findAll('a', {'class': 'item-name'}):
            title = link.string
            fw.write(title + '\n')
            fw.write(href + '\n\n')
        page += 1
    fw.close()

trade_spider(3)

hlsavior

Amazing tutorial! I got an assignment to make a web crawler and create tf-idf on the results. This gave me a headstart! Thank you so much!

yooos

It's 4:45 in the morning for you Bucky. Go to sleep >:O

camelCaseFTW

"It's kinda weird that the more work that your program does, the easier it is to code". Bucky Roberts

abidriaz

by far, and ive searched the ocean bottoms and moutain tops. did i say by far ? ummmm, just wanna say thanks, actually i wanna say a bit more. this 23-27yr. old age bracket has literally taught me how to build a website by: (Tyler) using wordpress step by step, and im learning more in three days with you than....ever. finally Chance tha Rapper initally got my attention by giving all of his music away free. WHAT ? Look at you guys, not freakin asking for money, just sayin "check this out, let me show payin it forward Bucky, ...I print T-shirts, like the best on tha planet....so my plan is finish me website, but with this integration of python functions ..huh? i would love to speak with you some day .. I would love to share something with you as an appreciation....

sydsgraphics

Wow. THANK YOU! THIS is GREAT! I've learned a lot this afternoon. In your case, you were able to uniquely identify the data you were crawling for via a class tag. In my case, I only want the <a> tags contained in either either <ul><li> tags or within a <table><tr><td> set of tags. I've been trying to follow the beautiful soup docs for navigating with find next sibling, etc. but I'm in over my head a bit. Vould you do a quick example of finding <a> tags within other tags to do the same kind of crawling?

MikePorterII

Hey Bucky,

Man, your tutorials are awesome. Keep doing more!
The only problem that you sometimes forget that some linkes are outdated.

Thanks!

kareemjeiroudi

Great tutorial you're a very good speaker and educator. Everything that you said was clear and precise with the example you made it easy for me to learn thank you very much I subscribed and I'm going to continue to watch the rest of your content good work

spartanaerospace

Python is great for web scraping, I recommend doing a multithreaded BFS for serious data science purposes.

Thegamemakur

Hey, Becky. Thanks for this great tutorial. But I was wondering, I have to create a generic crawler based on some keywords to crawl some random websites (which uses this keyword or have content based on this keyword) and then perform some logic. I do not want to explicitly mention websites or web pages link that needs to be crawled. How can that be done?

amanmaheshwari

No idea what happened in the last three videos, but it was cool! Time to get down to business.

TheAkbar

Question: How do I store the crawled data (CSV maybe)? I need to create an inverted index to run a query on the saved data.

pratiklibra

Great series. You solved many problems I had.

theody

Awesome video, Thanks Bucky. was greatly helpful.

roysamson

Hey Bucky - firstly, fantastic, clear and easy-to-follow tutorials. I've just stumbled across them but thanks so much already!
However, something you mentioned at the end of this one has tripped me up and I've not been able to dig up a solution. I know these videos are fairly old now, but how would you go about, in this example, adding the urls into a set to quickly and simply strip out the duplicates?

jonpoulter

I dontt see anything at ur buckyrooms trade websites?

sien

Python Programming Tutorial - 27 - How to Build a Web Crawler (3/3)

Python Tutorial - 27. Multiprocessing Introduction

Python Programming Tutorial - 27 - How to Build a Web Crawler (3/3)

Python Programming Tutorial - 27: String Functions (Part-2)

Python Tutorial for Beginners 27 - Python Encapsulation

Try / Except | Python | Tutorial 27

Python Tutorial 27 - Packages in Python

#27 Python Tutorial for Beginners | Array values from User in Python | Search in Array

P_27 Coding Exercise for Beginners in Python | Exercise 7 | Python Tutorials for Beginners

Live with Reuven Lerner: How to Sort Anything with Python

Learn Python Programming Tutorial Online Training by Durga Sir On 27-01-2018

Python Full Course for free 🐍 (2024)

Python Functions || Python Tutorial || Learn Python Programming

Remove Element - Leetcode 27 - Python

Python 3 Tutorial for Beginners #27 - Writing Files

Transfer Learning | Deep Learning Tutorial 27 (Tensorflow, Keras & Python)

Learn Python - Full Course for Beginners [Tutorial]

Python for Beginners – Full Course [Programming Tutorial]

Python Programming Tutorial - 8 - for

Python Programming Tutorial - 17 - Flexible Number of Arguments

Python for Beginners - Learn Python in 1 Hour

Iterators, Iterables, and Itertools in Python || Python Tutorial || Learn Python Programming

Selenium with Python Tutorial 27-Working with Cookies

The complete guide to Python

Pygame (Python Game Development) Tutorial - 27 - Centering Text