Timo Walther – Changelog Stream Processing with Apache Flink

Показать описание

We all know that the world is constantly changing. Data is continuously produced and thus should be consumed in a similar fashion by enterprise systems. Message queues and logs such as Apache Kafka can be found in almost every architecture, while databases and other batch systems still provide the foundation. Change Data Capture (CDC) has become popular to capture committed changes from a database and propagate those changes to downstream consumers.

In this talk, we will introduce Apache Flink as a general data processor for various kind of use cases on both finite and infinite streams. We demonstrate Flink's SQL engine as a changelog processor that is shipped with an ecosystem tailored to process CDC data and maintain materialized views. We will use Kafka as an upsert log, Debezium for connecting to databases, and enrich streams of various sources using different kinds of joins.

Finally, we illustrate how to combine Flink's Table API with DataStream API for event-driven applications beyond SQL.