How To Schedule A Cron Job To Run Python (Scrapy) Scripts For Web Scraping

preview_player
Показать описание
Tutorial, demonstrating how to schedule scripts (cron jobs) to run automatically - here we see a Scrapy-python script being scheduled and run.

(This is most relevant to Linux/Mac OSX operating systems)

It covers editing your crontab file.

◾crontab -e
You can create one if it doesn't already exist.
◾ daily
◾ weekly
◾ monthly
◾ or specific times of day

To copy what I show in the video you will also need to make sure you invoke the Scrapy spider using CrawlerProcess rather than the typical CLI Scrapy syntax.

The time and date fields are:

field allowed values
----- --------------
minute 0-59
hour 0-23
day of month 1-31
month 1-12 (or names, see below)
day of week 0-7 (0 or 7 is Sunday, or use names)

A field may contain an asterisk (*), which always stands for
"first-last".

Names can also be used for the 'month' and 'day of week' fields. Use
the first three letters of the particular day or month (case does not
matter). Ranges or lists of names are not allowed

⚠ Note!
If you want a file to run on the hour, you use 59, as the 60 mins range is 0-59!
eg:

RTFM 📙 :
man crontab
man cron

⚠ Disclaimer : Any code provided in this tutorial is for educational use only, I am not responsible for what you do with it. 🚓

Next I'll transfer some spiders and bs4 code to the Pi Zero and see if we can schedule on that using the same syntax used here.
See you around yeah?
Dr Pi.
Рекомендации по теме
Комментарии
Автор

Great video on an important part of web scraping that does't get the attention it deserves. Finding solutions to automatically run scripts for clients is an important area.

aarons
Автор

Thanks I didn't know it run on the backend that's cool, was kinda worried about it opening a terminal to run the command every now and then

dh
Автор

Very well explained and structured video again kind sir! But I am left with one question: do i trip or is your camera making weird things?xd Its kinda like waterdrops falling on you as a surface :D

louiswallice
Автор

Awesome video! Thanks..quick question, what about using virtual environment (venv). How do we set up that on crontab?

codecumba
Автор

Thanks for the video - in my spider I am importing and using items i.e. from myspider.items import myspiderItem. This works fine, but when I run the scrpit via cron I get the following error: ModuleNotFoundError: No module named 'myspider' how can I solve this?

camp
Автор

Please make tutorial scrape images from subreddit and autoposting to netlify bro? Thanks

randriyanto
join shbcf.ru