Handle Late or Duplicated Data and Archive Events for On-Demand Replay | 5/5

Показать описание

Find out how you can use Apache Flink to tackle late or duplicated data and improve data quality with exactly-once processing. We’ll also dive into archiving raw events for on-demand replay or reprocessing with Amazon Data Firehose.

In this series, Anand Shah (Data Analytics and Streaming Specialist at AWS) will help you build a modern data streaming architecture for a real-time gaming leaderboard. This architecture includes data ingestion, real-time enrichment with database change data capture (CDC), data processing, as well as computing, storing and visualizing the results. You will also learn advanced streaming analytics techniques, such as the control channel method for A/B testing, updating features and parameters with zero downtime, and how to handle late arrival of data. Anand will also talk you through the process of data de-duplication, as well as how you can store historical data for replay on-demand. 🎉

🌟 Get started with Amazon Managed Service for Apache Flink today, to build and run your fully managed Apache Flink applications on AWS!

Resources used in this video:

Continue your learning:

Follow AWS Developers:

Follow Anand Shah: 

00:00 Intro
00:21 Impact of late data arrival
01:23 How to handle late data arrival
01:52 Impact of duplicate messages
02:52 How to de-duplicate data
03:30 Demo: CDK source code walkthrough and deploy
05:00 Demo: Handling late arrival of data
05:26 Demo: Challenge 5.1 - De-duplicate data
06:04 Demo: Setup Amazon Data Firehose for data archival
10:32 Demo: On-demand replay of archived data
11:29 Demo: Challenge 5.2 - Replay data
11:53 Conclusion

 #LateDataArrival, #ExactlyOnce, #ArchivalAndReplay, #ManagedServiceForApacheFlink

Рекомендации по теме

Handle Late or Duplicated Data and Archive Events for On-Demand Replay | 5/5

Handle Late or Duplicated Data and Archive Events for On-Demand Replay | 5/5

Handling duplicated and missing data | Even More Python for Beginners - Data Tools [17 of 31]

Demo: Handling duplicated and missing data | Even More Python for Beginners - Data Tools [18 of 31]

Python Pandas : Dealing With Duplicated Data

Handling Duplicate Messages (Idempotent Consumers)

❌Eliminate duplicates with Power Automate

Data Quality & Handling Duplicates in HubSpot ft. Jonas De Mets

7. Removing duplicates based on condition from dataframe using partitionBy l Databricks l PySpark

Stop Sharing Call History between Two iPhone | iPhone Call Log Duplicate

How to Delete Duplicate Files on Windows & Mac | 4ddig duplicate file deleter

TCP Duplicate Acks Explained // How to Troubleshoot Them

How to make an interactive PowerPoint presentation - PowerPoint basic training

Resolving Duplicate Records in FileMaker - Resolving Duplicate Records in Claris FileMaker

How to Manage Multiple Projects [TIPS FOR PROJECT MANAGERS]

Finding duplicates with AL in Business Central

🔥How to Delete Duplicate Rows in SQL? | SQL Tutorial For Beginners | SQL Training | SimpliCode

How to manage duplicate records in your GoldMine system

Drop duplicates vs distinct|Pyspark distinct and dropduplicates | Pyspark Tutorial | Pyspark Course

How to prevent and manage duplicate records in GoldMine CRM #crm

Prevent Duplicate Data from Entering Dynamics 365 CRM

AutoCAD Civil 3d: how to remove COGO points duplicates in Civil 3d -- without using Excel !!!

Removing Duplicates data in excel by using Formula/Excel tricks & Tips 2021#shorts

Duplication of Benefits and Language Access Plan

Fix Duplicate Data in Excel - Part 2 - Highlight, Remove, Trigger | Prabas MS Office