Web Scraping Tutorial | Complete Scrapy Project : Scraping Real Estate with Python

Показать описание

A complete and detailed project based Scrapy tutorial - web scraping 3000 real estate properties and saving the details to a CSV in a structured format.

It features some non-standard logic to extract geo data from the detail page, then uses yield, and returns to the main listing to extract the price, title, href, date and 'hood'.

The 'next page' navigation was fairly 'normal' but I had to use yield in tandem with a method 'parse_detail' to extract out the "lon" and "lat" coordinates.

Being able to understand 'yield' and some familiarity with using Object Oriented Programming are the key to being able to use, modify and troubleshoot this project.

Note : This is a complete tutorial, but it is aimed at anyone who already has had some experience with Scrapy as there are some modifications to the standard framework.

I also show a detailed run through of how to identify the parts to use for the Scrapy selectors using "inspect element" in the browser then write the XPATH selectors and test them in Scrapy shell.

Chapter timings in the video:
0:00 Intro
1:42 Details of redandgreen website with documentation of project
4:14 Finding the source html we need
11:54 Scrapy shell
15:36 Testing XPATH selectors
19:56 Get Vs Extract
25:10 Getting the 'date' selector using 'inspect element'
28:36 Create the virtualenv [optional]
30:13 'scrapy startproject craigslistdemo'
34:33 Writing python code in Atom
35:21 Import the packages
41:24 Create the spider class
46:11 Checking the geo data (latitude & longitude)
1:00:43 Iterating through the 'ads' (thumbnail/properties on main listing)
1:07:18 'parse_detail'
1:39:14 Testing
1:40:03 Success
1:44:23 CSV check

► date
► title
► price
► hood
► link
► misc
► lon
► lat

** Update - see my updated video as well, which updates the 'lon' and 'lat' part :

This tutorial covers the real-world challenges you face when web scraping, non standard listings, and a good example of using "Yield" to move between iterations of items, details, and pages (aka Main Listings Page(s) and Detail Pages).

You can download the code for this project from GitHub :

✸ pip install scrapy
✸ pip install virtualenv

It is also viewable here:

I use Atom editor :
✸ sudo snap install atom --classic

And have just installed these packages:

With 'permission denied' in Atom I used this to allow me to save the .py file :
✸ sudo chown -R username:sudo ~/Documents

Please leave comments/suggestions and if you like this video, don't forget to .......✅

Who would like to see me attempt to run this on a Raspberry Pi, and schedule the spider to run as a cron job?

⚠ Disclaimer : Any code provided in this tutorial is for educational use only, I am not responsible for what you do with it. ⚠

Рекомендации по теме

Комментарии

Great tutorial. Solved one problem for me and introduced me to another.

I started getting the status 403 from CL before I read about autothrottle. Even now I don't think I have autothrottle configured properly yet as I keep getting blocked. But that's a new problem to solve.

Finally got an appreciation for xpath after struggling with css selectors in beautifulsoup. Thanks again for taking the time to do this.

thedavegtoo

this guy really threw efforts in editing... hats off

pythusiast

Man, your new video structuring technique is absolutely fantastic, I felt like I'm watching Traversy Media tutorial on steroids - your video editing work is just awesome and obviously I deeply respect the fact you're going your own way while using scrapy.
Keep it up!

monkey_see_monkey_do

I learnt a lot from this tut. This guy is awesome and hard working.

pythusiast

Quality stuff!

Going to try to figure out the new FEEDS setting.

Please keep the videos coming.

SirAdaox

thankyou, nice video, I always use CSS but am definitely going to try Xpath now

stupidsoft

Hi, very nice video!
Is it possible to webscrape some data and use that information to make a nice dashboard, tailored only to some data we are interested in?
Topic: real estate auctions

axel

Hi! Thank you for the very well explained video. I really appreciate. I have a question. Is this craiglist website static or dynamic website?

ellazova

Hi! Thanks for this great and complete walk through of your process! Greatly appreciated.
It was hilarious to see you looking for the bugs :)
Question:
How would you handle to avoid getting blocked/banned? What are the steps? Could you incorporate this is a complete video like this as well? Like for example scraping prices on amazon.

RonZuidema

A question sir... How can we remove &nbsp using xpath..
A sample here is attached
<span class="a-size-base a-color-price price a-text-bold">
₹ 999.00
</span>
I want to extract value 999.00

MRINAL

Dr. Pi, Lon and Lat are not exported correctly to csv. Values of records are wrong. Please comment on that

pythusiast

Excellent video! Very useful. I learned a lot new stuff.
Question:
I don't know if its me but it looks like some latitude and longitude data repeat more than once in the hole dataset? It is that ok?

waltercerritos

Very well dude ✨ Is there a way to scrap iframe?

noorrida

Web Scraping Tutorial | Complete Scrapy Project : Scraping Real Estate with Python

Web Scraping with Python - Beautiful Soup Crash Course

Scrapy Course – Python Web Scraping for Beginners

Beginners Guide To Web Scraping with Python - All You Need To Know

Web Scraping Tutorial | Data Scraping from Websites to Excel | Web Scraper Chorme Extension

Scraping Data from a Real Website | Web Scraping in Python

Python WEB SCRAPING in 30 Seconds! 🔥👨‍💻 #shorts

Advanced Web Scraping Tutorial! (w/ Python Beautiful Soup Library)

Python Tutorial: Web Scraping with BeautifulSoup and Requests

I Tested Venice AI vs. Claude for Web Scraping - See Which Won (And Saved Me $3000)

Ultimate Web Scraping tutorial

Am I going to jail for web scraping?

Beautiful Soup 4 Tutorial #1 - Web Scraping With Python

Scrapy for Beginners - A Complete How To Example Web Scraping Project

Web Scraping Full Course 2024 | Build and Deploy eCommerce Price Tracker

Web Scraping 101: A Million Dollar Project Idea

Comprehensive Python Beautiful Soup Web Scraping Tutorial! (find/find_all, css select, scrape table)

Web Scraping Tutorial Using Python | BeautifulSoup Tutorial 🔥

Selenium Course for Beginners - Web Scraping Bots, Browser Automation, Testing (Tutorial)

Web scraping in Python takes 2 seconds... #shorts

Python Selenium for Beginners — A Complete Web Scraping Project (Scraping Dynamic Websites)

Web Scraping with Python - Start HERE

Top Python Web Scraping Libraries 2016 to 2024 #python #webscraping #requests #scrapy #selenium

A Guide to Web Scraping with Node.js

Rapid Data EXTRACTION: Python Web Scraping Tutorial #techeducation #webscraping #python101