filmov
tv
How Adobe uses Structured Streaming at Scale
Показать описание
Adobe’s Unified Profile System is the heart of its Experience Platform. It ingests TBs of data a day and is PBs large. As part of this massive growth we have faced multiple challenges in our Apache Spark deployment which is used from Ingestion to Processing. We want to share some of our learnings and hard earned lessons and as we reached this scale specifically with Structured Streaming.
Know thy Lag
While consuming off a Kafka topic which sees sporadic loads, its very important to monitor the Consumer lag. Also makes you respect what a beast backpressure is.
Reading Data In
Fan Out Pattern using minPartitions to Use Kafka Efficiently
Overload protection using maxOffsetsPerTrigger
More Apache Spark Settings used to optimize Throughput
MicroBatching Best Practices
Map() +ForEach() vs MapPartitons + forEachPartition
Adobe Spark Speculation and its Effects
Calculating Streaming Statistics
Windowing
Importance of the State Store
RocksDB FTW
Broadcast joins
Custom Aggegators
OffHeap Counters using Redis
Pipelining
Connect with us:
Know thy Lag
While consuming off a Kafka topic which sees sporadic loads, its very important to monitor the Consumer lag. Also makes you respect what a beast backpressure is.
Reading Data In
Fan Out Pattern using minPartitions to Use Kafka Efficiently
Overload protection using maxOffsetsPerTrigger
More Apache Spark Settings used to optimize Throughput
MicroBatching Best Practices
Map() +ForEach() vs MapPartitons + forEachPartition
Adobe Spark Speculation and its Effects
Calculating Streaming Statistics
Windowing
Importance of the State Store
RocksDB FTW
Broadcast joins
Custom Aggegators
OffHeap Counters using Redis
Pipelining
Connect with us: