filmov
tv
How To Schedule A Cron Job To Run Python (Scrapy) Scripts For Web Scraping

Показать описание
Tutorial, demonstrating how to schedule scripts (cron jobs) to run automatically - here we see a Scrapy-python script being scheduled and run.
(This is most relevant to Linux/Mac OSX operating systems)
It covers editing your crontab file.
◾crontab -e
You can create one if it doesn't already exist.
◾ daily
◾ weekly
◾ monthly
◾ or specific times of day
To copy what I show in the video you will also need to make sure you invoke the Scrapy spider using CrawlerProcess rather than the typical CLI Scrapy syntax.
The time and date fields are:
field allowed values
----- --------------
minute 0-59
hour 0-23
day of month 1-31
month 1-12 (or names, see below)
day of week 0-7 (0 or 7 is Sunday, or use names)
A field may contain an asterisk (*), which always stands for
"first-last".
Names can also be used for the 'month' and 'day of week' fields. Use
the first three letters of the particular day or month (case does not
matter). Ranges or lists of names are not allowed
⚠ Note!
If you want a file to run on the hour, you use 59, as the 60 mins range is 0-59!
eg:
RTFM 📙 :
man crontab
man cron
⚠ Disclaimer : Any code provided in this tutorial is for educational use only, I am not responsible for what you do with it. 🚓
Next I'll transfer some spiders and bs4 code to the Pi Zero and see if we can schedule on that using the same syntax used here.
See you around yeah?
Dr Pi.
(This is most relevant to Linux/Mac OSX operating systems)
It covers editing your crontab file.
◾crontab -e
You can create one if it doesn't already exist.
◾ daily
◾ weekly
◾ monthly
◾ or specific times of day
To copy what I show in the video you will also need to make sure you invoke the Scrapy spider using CrawlerProcess rather than the typical CLI Scrapy syntax.
The time and date fields are:
field allowed values
----- --------------
minute 0-59
hour 0-23
day of month 1-31
month 1-12 (or names, see below)
day of week 0-7 (0 or 7 is Sunday, or use names)
A field may contain an asterisk (*), which always stands for
"first-last".
Names can also be used for the 'month' and 'day of week' fields. Use
the first three letters of the particular day or month (case does not
matter). Ranges or lists of names are not allowed
⚠ Note!
If you want a file to run on the hour, you use 59, as the 60 mins range is 0-59!
eg:
RTFM 📙 :
man crontab
man cron
⚠ Disclaimer : Any code provided in this tutorial is for educational use only, I am not responsible for what you do with it. 🚓
Next I'll transfer some spiders and bs4 code to the Pi Zero and see if we can schedule on that using the same syntax used here.
See you around yeah?
Dr Pi.
Комментарии