Craigslist Scraper with Python and Selenium: Part 3

preview_player
Показать описание
In this video series, we will be writing a script in Python using web scraping modules such as Selenium, Beautiful Soup, and urllib to extract information from the website Craigslist. Specifically, this script will be responsible for forming a query to search, i.e. a set of criteria such as items to look for in a given location, zip code, etc. Once we form this query, we use our script to automatically perform a search and extract two key pieces of information from this search. Namely, we will extract the titles of each of the postings along with the links for each post.

This project is purposefully simple as it will optimistically serve as a springboard for you to build upon. For instance, perhaps you want to keep tabs on when a certain item is listed in your area. Perhaps you could modify the script to automatically email you if any items of interest pop up in your area. The possibilities are quite vast, and I hope you use this to build something useful and cool. If you do, please share it!

Related Links:

Related Links:

This video is part of a larger series on "Web Scraping and Automation". You can watch the other videos in this series here:

Further videos on Selenium:

Do you like the development environment I'm using in this video? It's a customized version of vim that's enhanced for Python development. If you want to see how I set up my vim, I have a series on this here:

If you've found this video helpful and want to stay up-to-date with the latest videos posted on this channel, please subscribe:
Рекомендации по теме
Комментарии
Автор

Really grateful for you sharing your knowledge. It's very accessible and you clearly know your stuff.

alexbordei
Автор

As an addition to all the automation-related series, how about one on cron-jobs?
Just a suggestion. :) Thanks for the quality vids.

simonj
Автор

Hey, thanks for the excellent tutorial. Just one question--is selenium necessary? Aren't all the things done here capable with only urllib and beautifulsoup? My understanding is that you can request the HTML and simply parse from there, the things that you did with selenium.

simonj
Автор

Subscribed, this is really excellent. One question I had was about following pages in the cases where product search results are arranged over several pages? Is there a way in selenium to do this?

CallumD
Автор

Hi, Vincent (sorry I called you wrongly in the previous comments!). again thank you for the great videos! I’d like to add some functions to this example as below. Your advice will be much appreciated.

1, Crawl the website on schedule (every xx minutes) without keeping my laptop open
2, I want to get notified by email when a new item with a certain condition is posted.

For 1, I’m considering to use heroku.(I haven’t used it at all though) Is this the right choice or would you possibly know any better choice? Just found google colaboratory. Do you know if it helps?

For 2, I want to retrieve information about only new items. Could you suggest how to write such codes?

keitaroumehara
Автор

Hi! great video! I was just wondering how would I click on the link text and parse it and look for a specific span tag?

Example in my case, I'm looking for a car under $x. I already get the prices, date, and year, saved to a CSV. Now I want to click on a link in a new tab, find the mileage and some keywords about the condition of the car, then close the tab, and save those keywords and mileage to my CSV.

How would I proceed to doing it?

iNotSoTall
Автор

can you make a video about linear congruential method in python?

the
join shbcf.ru