Handle Late or Duplicated Data and Archive Events for On-Demand Replay | 5/5

preview_player
Показать описание
Find out how you can use Apache Flink to tackle late or duplicated data and improve data quality with exactly-once processing. We’ll also dive into archiving raw events for on-demand replay or reprocessing with Amazon Data Firehose.

In this series, Anand Shah (Data Analytics and Streaming Specialist at AWS) will help you build a modern data streaming architecture for a real-time gaming leaderboard. This architecture includes data ingestion, real-time enrichment with database change data capture (CDC), data processing, as well as computing, storing and visualizing the results. You will also learn advanced streaming analytics techniques, such as the control channel method for A/B testing, updating features and parameters with zero downtime, and how to handle late arrival of data. Anand will also talk you through the process of data de-duplication, as well as how you can store historical data for replay on-demand. 🎉

🌟 Get started with Amazon Managed Service for Apache Flink today, to build and run your fully managed Apache Flink applications on AWS!


Resources used in this video:

Continue your learning:

Follow AWS Developers:

Follow Anand Shah: 

00:00 Intro
00:21 Impact of late data arrival
01:23 How to handle late data arrival
01:52 Impact of duplicate messages
02:52 How to de-duplicate data
03:30 Demo: CDK source code walkthrough and deploy
05:00 Demo: Handling late arrival of data
05:26 Demo: Challenge 5.1 - De-duplicate data
06:04 Demo: Setup Amazon Data Firehose for data archival
10:32 Demo: On-demand replay of archived data
11:29 Demo: Challenge 5.2 - Replay data
11:53 Conclusion

 #LateDataArrival, #ExactlyOnce, #ArchivalAndReplay, #ManagedServiceForApacheFlink
Рекомендации по теме