Reactive Summit 2020: Ajit Koti, Tale of Stateful Stream to Stream Processing

preview_player
Показать описание
Streaming engines like Apache Flink are redefining how we process data. Flink provides the opportunity to extract, transform, and write data with ease matching that of batch data processing frameworks. There are plenty of known and proven use cases of how to convert a single batch job into a streaming job. However, there are quite many challenges when we want to convert a stateful end-to-end batch workflow to multiple stateful stream jobs. Netflix processes payment for 180M+ members across 190 countries. Payment processing and transaction data is very critical for measuring operational health and performance of our payments platform. We decided to move the existing batch workflow completely to stream. Things started to get exciting when we wanted to introduce multiple streaming jobs with zero data loss and high accuracy. In this talk, we describe how we converted a conventional complex stateful batch workflow to a multi-step stateful streaming workflow at Netflix using Flink. You’ll learn about 1)Design and architecture involving multiple stateful streaming jobs 2)Managing schema evolution using Avro for stateful real-time applications 3)Sharing code between Flink and Spark for any fallback batch processing. 4) Handling cascading impact when events arrive out of order 5) Landing processed data in real-time into multiple sinks such as Iceberg and Druid.
Рекомендации по теме