Designing Structured Streaming Pipelines—How to Architect Things Right - Tathagata Das Databricks

Показать описание

Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in functions make it easy for developers to express complex computations. However, expressing the business logic is only part of the larger problem of building end-to-end streaming pipelines that interact with a complex ecosystem of storage systems and workloads. It is important for the developer to truly understand the business problem needs to be solved.

What are you trying to consume? Single source? Joining multiple streaming sources? Joining streaming with static data?
What are you trying to produce? What is the final output that the business wants? What type of queries does the business want to run on the final output?
When do you want it? When does the business want to the data? What is the acceptable latency? Do you really want to millisecond-level latency?
How much are you willing to pay for it? This is the ultimate question and the answer significantly determines how feasible is it solve the above questions.

These are the questions that we ask every customer in order to help them design their pipeline. In this talk, I am going to go through the decision tree of designing the right architecture for solving your problem.

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:

Рекомендации по теме

Комментарии

Thank you very much for the education.

karthikeyanbalachandran

Could you please share the links to reference previous deep dive talks/sessions/demos?

karthikeyanbalachandran

basically use delta lake and all problems solved !!!!

abhinee

Designing Structured Streaming Pipelines—How to Architect Things Right - Tathagata Das Databricks

Designing Structured Streaming Pipelines—How to Architect Things Right - Tathagata Das Databricks

Designing ETL Pipelines with Structured Streaming and Delta Lake— How to Architect Things Right

Data Pipelines: Introduction to Streaming Data Pipelines

Stream vs Batch processing explained with examples

Designing and Building Next Generation Data Pipelines at Scale with Structured Streaming-Burak Yavuz

Batch Processing vs Stream Processing | System Design Primer | Tech Primers

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

Apache Kafka in 6 minutes

Data Pipelines Explained

How Video Streaming works | System Design

Streaming Data Pipelines Demo - Design the Solution (Kafka, Spark Structured Streaming and HBase)

Spark Streaming Example with PySpark ❌ BEST Apache SPARK Structured STREAMING TUTORIAL with PySpark...

21. Databricks| Spark Streaming

Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends

Chapter #9 - How to design data pipeline on gcp (Google Cloud Platform) ?

What the HECK is a “Data Pipeline”? 👩🏻‍🔧📊🪠

📈 Stock Market Real-Time Data Analysis Using Kafka | End-To-End Data Engineering Project

13 Designing streaming pipelines with Apache Beam

'Design Patterns for Data Pipelines' - Lisa Dusseault (PyBay 2023)

Get Data Into Databricks - Simple ETL Pipeline

What are some common data pipeline design patterns? What is a DAG ? | ETL vs ELT vs CDC (2022)

Imperva: Building Real-Time Streaming Data Pipelines Using Amazon MSK

Streaming Data Pipelines Demo - Data Processing using Spark Structured Streaming

Streaming Data Pipelines Demo - Create Kafka and Spark Structured Streaming Program Using IDE