Stream Processing: Choosing the Right Tool for the Job - Giselle van Dongen

preview_player
Показать описание
Due to the increasing interest in real-time processing, many stream processing frameworks were developed. However, no clear guidelines have been established for choosing a framework for a specific use case. In this talk, two different scenarios are taken and the audience is guided through the thought process and questions that one should ask oneself when choosing the right tool. The stream processing frameworks that will be discussed are Spark Streaming, Structured Streaming, Flink and Kafka Streams. The main questions are: - How much data does it need to process? (throughput) - Does it need to be fast? (latency) - Who will build it? (supported languages, level of API, SQL capabilities, built-in windowing and joining functionalities, etc) - Is accurate ordering important? (event time vs. processing time) - Is there a batch component? (integration of batch API) - How do we want it to run? (deployment options: standalone, YARN, mesos, ...) - How much state do we have? (state store options) - What if a message gets lost? (message delivery guarantees, checkpointing) For each of these questions, we look at how each framework tackles this and what the main differences are. The content is based on the PhD research of Giselle van Dongen in benchmarking stream processing frameworks in several scenarios using latency, throughput and resource utilization.

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:
Рекомендации по теме
Комментарии
Автор

We can use azure data explorer for near real time calculations for telemetry data. Azure Databricks also can natively stream data from IoT Hubs directly into a Delta table on ADLS and display the input vs. processing rates of the data. I wanted to know what is the use case when ADX is significantly better than Databricks streaming ingestion( workload and cost wise)?

nidhisharma-rbnx