SREcon18 Europe - Care and Feeding of Data Processing Pipelines

Показать описание

Rita Sodt, Google

Data processing pipelines have important use cases ranging from business analytics, machine learning, eliminating spam and abuse, and delivering billing invoices to transforming data for many important user facing serving jobs. These pipelines are often composed of multiple steps where the input of one is the output of another and with dependencies on external systems and storage, all of which can break. When they do, and pipelines fail to meet SLOs, fixes are often expensive and time consuming, especially if a large data set needs to be reprocessed or repaired. It is best to focus on prevention and quickly detecting and responding to the issues, which is where SRE can help.

In part the difficulty of managing pipelines lies in their difference from serving jobs. Unable to monitor RPC latency and errors directly as a proxy for customer happiness it's necessary to gain visibility into the age of oldest unprocessed data and measure data correctness since corrupt output data may be customer visible and persisted even when serving jobs report no errors. To prevent issues and minimize impact techniques such as canarying, incremental rollout, automatic failover, and autoscaling can be used, which all have specific considerations for pipelines.