DP-203: 49 - Introduction to streaming, Event Hubs

preview_player
Показать описание
Hey Data Engineers!

Tired of the same old batch processing? Let's dive into something more exciting—streaming and real-time data processing!

In the 49th episode of my free DP-203 course, I'm exploring streaming, sharing real-world examples, outlining a high-level architecture, and taking a deep dive into the ingestion phase using Azure Event Hubs.

Enjoy!

▬▬▬▬▬▬ IMPORTANT LINKS ▬▬▬▬▬▬

▬▬▬▬▬▬ MEMBERSHIP ▬▬▬▬▬▬
Join this channel to get access to perks:

▬▬▬▬▬▬ CHAPTERS ▬▬▬▬▬▬
00:00 Introduction
00:18 Streaming vs batch processing
06:08 Streaming in action
11:43 General architecture
16:19 Ingest part
19:04 Rasberry PI simulator
21:34 Event hubs
48:43 Summary
Рекомендации по теме
Комментарии
Автор

DP-203: 49 - Notes:

Phases: Ingest -> Process -> Serve

Event Sources (Producers):
- Apps;
- IoT (sensors);
- Streaming sources;

Ingest options:
A) Event Hubs (from event producers to Azure - one way communication), for not IoT devices;
B) IoT Hub (bi directional - two ways communication);

Azure Event Hubs is a native data-streaming service in the cloud. Can stream millions of events per second.
Integration with:
- compatible with Apache Kafka;
- integrate nicely with Stream Analytics (to process the data);
- integrate nicely with Databricks (to process the data).

Event Hubs -> Create Namespace -> Pricing tiers
Where Tu - Throughput unit - ingress up to 1MB/s / 1k events/s (whichever comes first)
egress up to 2MB/s / 4k events/s

Auto-inflate: allow to increase number of Tu units. It will not scale down automatically.

After we created Event Hubs container (namespace) we can create Event Hubs.

Entities -> + Event Hub -> Name -> Partition count 1 -> Capture on/off -> Create.
!# SAS Policies: Settings -> Shared access policies -> + Add -> Manage / Send / Listen.
In result we have Keys, connection strings with keys.

MS Entra ID configuration is also possible but more complicated.

Consumer Group (we have $Default), can create multiple.
It's consumer responsibility to save data offset (what was already read) it's called checkpointing.

Event Hub is not equal to Storage Account Queue (can read the same data multiple times).

Capture - allow to dump events to a DataLake (ADLSg2) in Avro, Parquet, Delta formats.

yevhen
Автор

Awesome work, more content about streaming is greatly appreciated :)
Thanks !!!

vlad_badiuc
Автор

Hi, it's great to start another section of the course! When are you going to upload solutions to challanges? After completing the last episode of the course or later?

TheMapleSight
Автор

i would like to know :- if u r given an opportunity to automate whole project on any one technology ( ingest, transform, serve ) whcih one platform will you use Databricks or Azure & why ?

bhargavthakkar
Автор

Thank you for all your efforts on this. Very grateful for it.
Does this course if followed properly provide enough information to have a career as a Azure data engineer (not just the certificate)
Thank you again

ZeeshanKhan-ffxl
Автор

So once we complete the series, what are you planning to teach more? Maybe some end to end projects or other data Engineering technologies you know.

muhammadzakiahmad
Автор

but how can we limit access to our event hub as anyone with the connection string of the share access policy can send event data there?
i think I can restrict the access through networking! or there are other methods?

SAJO
Автор

Hi Piotr, can you please make one on event grid ? much appreciate it

alphar
visit shbcf.ru