Matthew Bayly - Scheduled Web-scraping With R from a Server

Показать описание

Cascadia R Conf 2021 Digital Track

Matthew Bayly
Vancouver, British Colombia

Scheduled Web-scraping With R from a Server
It is likely that many of you have experimented with rvest and related packages for accessing (or scraping) data directly from a website, but what about performing these operations as routine tasks on a fixed schedule? For example, say you wanted to download avalanche risk forecasts and weather data at 6:00 am daily from various websites and store this information for subsequent analyses. We can accomplish this by running a given R script as a cron job (Linux/Mac) or by using the Task Scheduler (Windows) from either a desktop or server environment. This simple design pattern opens a whole world of possibilities such as enabling us to build our own real-time Shiny-free dynamic html/js dashboards by overwriting a single data file. I will also briefly discuss web scraping ethics, API endpoints and the polite.

Bio: Matthew Bayly works as a Decision Support Tool developer with ESSA in Vancouver. His work focuses on integrating information from various models, databases and real-time sensors into dynamic tools to support data visualizations and decision making. His broad background in environmental science and passion for ecological research fuels his interest in programming and web development as applied tools. Matthew enjoys the creative freedom and strategic design elements of programming in R. He also holds big visions for how all of this will revolutionize environmental management and decision-making writ large. His key focal areas include stream networks, fisheries, and water management.

Cascadia R Conf