Python Programming Tutorial - 26 - How to Build a Web Crawler (2/3)

Показать описание

Рекомендации по теме

Комментарии

4 am thenewboston you are awesome... thanks for the tutorials

Elduque

Wow, I'm impressed, I fell asleep during watching 20th video (it wasn't boring, I just didn't sleep much lately) and now you're aleady teaching some (almost) exciting stuff :D

motylanoga

for first time felt programming is fun thank you for making it possible to get interested in python and all what it does

missghani

I tried this using craigslist and got it working! Love the tutorials Bucky, keep them coming!

jasongodson

omfg I can't believe I actually made one by myself and it also takes the pictures and sub-titles thanks man

genosingh

"The good meat of the website" - Bucky Roberts 2014

zachariahwalston-leo

@thenewboston, I've tried to follow this tutorial but it's hard to make it work due to most websites this days running scripts on browser. I don't know if it was the same back when you made this video. I had to use Selenium to access the actual html you see in the "inspect element". Selenium is a web driver that works with Chrome, Firefox and others. It works a bit different than "requests" but I think it's more powerful. You should do a Python-Selenium tutorial?

ernestselman

Since his site is down, reddit works pretty well. I think they have a timeout if you run your crawler too often, but if you wait a bit it should work again. (it doesn't look like you can easily change the pages though, so you'll have to omit that part of the code unless you're smarter than me haha)

AnderMalkus

I am getting the html parser error so i added "html.parser"after plain_text, but then I am still getting the following error

Traceback (most recent call last):
File "file path", line 22, in <module>
trade_spider(1)
File "file path", line 13, in trade_spider
soup = BeautifulSoup(plain_text, "html parser")

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html parser. Do you need to install a parser library?

Phoebusjosh

This tutorial was great and easy to follow. It worked perfectly. Thanks Bucky!!

jeremykerrigan

its great... i finally can crawl any website and get required data from it! thnkz bucky!!

facitoo

Complete solution with url:
import requests
from bs4 import BeautifulSoup

def trade_spider(max_page):
page=1
while page <= max_page :
source_code= requests.get(url)
plain_text= source_code.text
soup= BeautifulSoup(plain_text, "html.parser")

for link in soup.findAll('a', {'class':'title text-semibold'}):
href= link.get('href')
print(href)
page +=1

trade_spider(1)

Note: I used bucky's github page in the description and it worked

ankitaroy

I like how you say string almost like "shtring" :D

Zwerggoldhamster

now thats getting excited !!!! all other stuffs were just normal not fun boring but you need to know the basics to jump for bigger my head is kinda fked up after getting all that things soo i need to watch the video more couple times and gj buddy keep it up you are doing great!

SixtyNeptune

i am trying to get it to work with craigslist and it just starts printing out the word none a bunch until i hit stops then about 5 errors pop up

rydermcbride

Bucky, you're the man!
Thanks for the awesome tutorial!

malabikasen

In case you get stuck, try with following modifications :
(It is working for me)

import os
import requests
from bs4 import BeautifulSoup

# uncomment the line below and set the user_id n pass if working on college proxy

def trade_spider(max_pages):
page = 2
while page <= max_pages:
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')

for link in soup.findAll('h2', {'class': 'entry-title'}):
href = link.a['href']
print(href)

page += 1

trade_spider(2)

SunilKumar-iffd

If, around the 9:05 mark, you need to get the title out of a child element (direct child or not), BeautifulSoup offers the use of elements as functions as such:

Instead of:
title = link.string

Use:
title = link.h3.find(string=True)

Of course given that the element that houses the title is an <h3> child element of the element you hook the for loop onto. This is useful if the element with the actual link to the entry differs from the element with the title of the entry.

rayromanov

thank you very much, i designed a web crawler for my college website

amoghkulkarni

Thankyou for such a beautiful tutorial

devarshsanghvi

Python Programming Tutorial - 26 - How to Build a Web Crawler (2/3)

Python Programming Tutorial - 26: String Functions (Part-1)

#26 Python Tutorial for Beginners | Array in Python

Python Programming Tutorial - 26 - How to Build a Web Crawler (2/3)

Python Tutorial - 26. Multithreading - Introduction

Python Tutorial #26 Interaktives Programm Menü in Python programmieren [GERMAN/DEUTSCH]

Learn Python - Full Course for Beginners [Tutorial]

Python for Beginners – Full Course [Programming Tutorial]

Python Full Course for Beginners

Función SUM en SQL Server #sqlserver #excel #sql

Coding Python on mobile in 30 seconds #shorts

Learn Python Programming - Python Course

Python for Beginners - Learn Coding with Python in 1 Hour

Exercise 2: Solution & Shoutouts | Python Tutorial - Day #26

Learn Python with ChatGPT

Python Full Course - 12 Hours | Python For Beginners - Full Course | Python Tutorial | Edureka

Python 3 Tutorial for Beginners #26 - Reading Files

Selenium with Python Tutorial 26-Data Driven Testing using Microsoft Excel + OpenPyXL Module

Python Tutorial for Beginners 26 - Is it possible to define multiple constructors in Python?

Build Python Portfolio with Github | Python Tutorial #26

Python Sum() Function Is WRONG! #python #programming #coding

Python Tutorials for Beginners - Learn Python Online

Learn Python Programming Tutorial Online Training by Durga Sir On 26-01-2018

Comments | Python | Tutorial 26

Python Tutorial for Beginners #19 - Getting Started in GitHub