Near real-time CDC using DataStream

preview_player
Показать описание
In today's video I've done a deep dive on why and how to setup a Change Data Capturing (CDC) solution using the Datastream service on Google Cloud.

The talk covers architecture, detailed configuration setup, terraform code examples and a demo on how to configure everything required with a Cloud SQL database to capture change data and stream into BigQuery in near real-time.

00:27 - The old ways
01:37 - The past few years
02:54 - Change Data Capturing (CDC)
05:47 - Datastream CDC
13:02 - Code walk through & demo

Further reading
Рекомендации по теме
Комментарии
Автор

Hello Richard, It was wonderful video, but somehow i couldn't setup the tcp proxy, how did you do it ? through reverse proxy method or auth proxy mehtod ? I see you are the only successful person who has done this so far ? could you please create a tutorial video for the same ?

kavirajansakthivel
Автор

Hi Richard, thank you for the wonderful tutorial. I find myself in a situation where I need to revoke the SELECT permission in the MySQL database after the data stream backfill has run and while the CDC is in progress. This is intended to alleviate some load on the master, I believe. Although the documentation from Google states that we need SELECT permission for the database, I had a conversation with the Google team, and they want me to test this scenario. Have you encountered something similar? What are your thoughts?

manojkumarchinnaswamy
Автор

Hello Richard,
Great video. I have some question though:
From your video, it appears that the data freshness is 56s, System latency is 5-10s, Total latency 20-90s.
Are this the normal SLA for Datastream? Isn't it Debezium will have much lower latency?

Also the needs of TCP proxy kind of defeat the serverless model of Datastream.
As the number of transaction increases, Do we need to scale the TCP proxy? How do we scale it?

ariefhalim
Автор

Hello Richard, first of all thanks a lot for all the wonderful videos. I have a question. Is there something like this for replicating in near real time data from BigQuery into a relational DB? If there is not an out-of-the-box service for this, what would be the best way to build a custom solution? Thanks! Keep up the great work.

ItsMe-mhib
Автор

Hi can we connect BI tools like Tableau, Looker directly to these Authorized views ? Is it advisable

SaiDileepfantasy
Автор

Hello, very interesting thank you!
I do not understand why replication with historical data could generate issues.
Let's say the binary logging get corrupted since timestamp X. To be sure that we have valid data, can we not just delete all the timestamps after X-1 ?

mickaelchau