A deep dive into Flink SQL - Jark Wu, Kurt Young

preview_player
Показать описание
During last two major versions (1.9 & 1.10), Apache Flink community spent lots of effort to improve the architecture for further unified batch & streaming processing. One example for that is Flink SQL added the ability to support multiple SQL planners under the same API. This talk will first discuss the motivation behind these movements, but more importantly will have a deep dive into Flink SQL. The presentation shows the unified architecture to handle streaming and batch queries and explain how Flink translates queries into the relational expressions, leverages Apache Calcite to optimize them, and generates efficient runtime code for execution. Besides, this talk will also describe the lifetime of a query in detail, how optimizer improve the plan based on relational node patterns, how Flink leverages binary data format for its basic data structure, and how does certain operator works. This would give audience better understanding of Flink SQL internals.

Speakers: Jark Wu, Kurt Young from Alibaba
Рекомендации по теме
Комментарии
Автор

Thanks for the presentation. Are there any plans that the slides are shared ?

ahmadawad
Автор

Hi, After reading data from source then only we will come to know whether there are 1000 rows or 1million rows right? then only we will decide which one to use either hash based join or broadcast based join. but how are we deciding to use broadcast hash join in physical plan?

cdinesh
Автор

Or was it using such efficient deserialization only for Tuples and not for Rows?

FlavioPompermaier
Автор

Is there an optimization here similar to whole stage code generation as in Spark?

ahmadawad
Автор

flink, will govern the real-time compute and machine learning!

qiwei