Deep Dive into Stateful Stream Processing in Structured Streaming - Tathagata Das

preview_player
Показать описание
"Stateful processing is one of the most challenging aspects of distributed, fault-tolerant stream processing. The DataFrame APIs in Structured Streaming make it very easy for the developer to express their stateful logic, either implicitly (streaming aggregations) or explicitly (mapGroupsWithState). However, there are a number of moving parts under the hood which makes all the magic possible. In this talk, I am going to dive deeper into how stateful processing works in Structured Streaming. In particular, I am going to discuss the following. - Different stateful operations in Structured Streaming - How state data is stored in a distributed, fault-tolerant manner using State Stores - How you can write custom State Stores for saving state to external storage systems.

Session hashtag: #EUstr7"

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:
Рекомендации по теме
Комментарии
Автор

We performed Streaming joins using Kafka Streams but faced a barrier with handling late data. Now, eagerly waiting for Spark 2.3 ;)

TEgamingmadness
Автор

what if there is no data coming for any of the groups and wartermark doesnt progress . how will events get timedout in that case if we use eventtimetimeout ?

Dyslexic_Neuron
Автор

14:21 with regards to deduplication, why not just use delta merge/upsert ?

takreem.akhter
Автор

What a weird way to make your speaker stop presenting at the end

Bowonfire
Автор

Time 5:53 sec, speaker says maintaining the state thru checkpointing allows fault tolerance in both stateful and stateless streaming. As far as I understand we don't maintain the state in stateless streaming. How come stateless streaming become fault tolerant ?

megharaina
Автор

guys any tutorial on how i can stream crypto data from different sources?

rexche