Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis

Показать описание

Massive Scale Data Processing at Netflix using Flink

Over 137 million members worldwide are enjoying TV series, feature films across a wide variety of genres and languages on Netflix. It leads to petabyte scale of user behavior data. At Netflix, our client logging platform collects and processes this data to empower recommendations, personalization and many other services to enhance user experience. Built with Apache Flink, this platform processes 100s of billion events and a petabyte data per day, 2.5 million events/sec in sub milliseconds latency. The processing involves a series of data transformations such as decryption and data enrichment of customer, geo, device information using microservices based lookups.

The transformed and enriched data is further used by multiple data consumers for a variety of applications such as improving user-experience with A/B tests, tracking application performance metrics, tuning algorithms. This causes redundant reads of the dataset by multiple batch jobs and incurs heavy processing costs. To avoid this, we have developed a config driven, centralized, managed platform, on top of Apache Flink, that reads this data once and routes it to multiple streams based on dynamic configuration. This has resulted in improved computation efficiency, reduced costs and reduced operational overhead.

Stream processing at scale while ensuring that the production systems are scalable and cost-efficient brings interesting challenges. In this talk, we will share about how we leverage Apache Flink to achieve this, the challenges we faced and our learnings while running one of the largest Flink application at Netflix.

Flink Forward San Francisco 2019
#flinkforward

Flink Forward

Рекомендации по теме

Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis

AWS re:Invent 2023 - Data processing at massive scale on Amazon EKS (CON309)

Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis

How Apache Spark Redefines Big Data Processing at Massive Scale

The Evolution of Massive Scale Data Processing: Strata + Hadoop World San Jose 2017

The Evolution of Massive Scale Data Processing: Strata + Hadoop World NYC 2016

Hadoop In 5 Minutes | What Is Hadoop? | Introduction To Hadoop | Hadoop Explained |Simplilearn

Big Data In 5 Minutes | What Is Big Data?| Big Data Analytics | Big Data Tutorial | Simplilearn

Lessons from Building a Large-scale, Multi-cloud Data Platform at Databricks | Jeff Pang

Build and Deploy Apps Faster, Anywhere

Alejandro Saucedo - Real Time Stream Processing for Machine Learning at Massive Scale

Mastering Spark: The Ultimate Guide to Large Scale Data Processing

Build Large-Scale Data Analytics and AI Pipeline Using RayDP

Learning from Social Data Processing - Optimization and Control of Large Scale Networks - 1/6

Resource-Efficient Redundancy for Large-Scale Data Processing and Storage Systems

Big Data Analytics on Massive Scale Graphs

15 - Analytics for Autonomous Driving Large scale sensor data processing, Jan Wiegelmann, Autovia AI

How-to | Conduct Large-Scale Data Warehousing with MaxCompute

Data Lifecycles at Massive Scale Using Python

Beam Summit 2023 | Large scale data processing Using Apache Beam and TFX libraries-Olusayo Olumayode

C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, Now & Future

Large Scale Data Loading and Data Preprocessing with Ray

Large Scale Stream Processing in the Hadoop Ecosystem

Large-scale data ingest on GCP (Google Cloud Next '17)

Explore Fundamentals Of Data Analytics In Azure For Large Scale Data| K21Academy