Efficient Data Ingestion with Glue Concurrency and Hudi Data Lake

preview_player
Показать описание
Efficient Data Ingestion with Glue Concurrency: Using a Single Template for Multiple S3 Tables into a Transactional Hudi Data Lake
Efficient Data Ingestion with Glue Concurrency: Using a Single Template for Multiple S3 Tables into a Tra Managing a data lake with multiple tables can be challenging, especially when it comes to writing ETL or Glue jobs for each table. That's why I'm excited to share my upcoming video on the templated approach for managing ETL jobs in a data lake. By creating a single job that can be used for multiple tables, you can save time and reduce the amount of infrastructure code needed to manage your data lake. Join me to learn more about how you can templatize your code and run the same job for multiple tables, making it easier to scale and manage your data lake

Code and Step by step guide

Article:
Рекомендации по теме
Комментарии
Автор

Love it. We did something similar in our org. use dynamo db for storing parameters. it also used sqs and stepfunctions.

ghostvideouploader
Автор

Hi Soumil, Your content has really helped me to progress with Apache Hudi. Thanks for sharing it out!
I wanted to ask, how can I leverage Apache Hudi CLI properties when I am using AWS Glue (not AWS EMR)

the_mindful_mitra
Автор

Can u make a video to introduce how to ingest CDC of all tables under one database (say MySQL) into hudi data lake in one job? Supporting sync dynamically newly added tables and backward compatibility schema evolution of existing tables.

陈帅-xj
Автор

Any possible to do for both customer and orders as one table

unagarjuna
Автор

Can u make 1 video where u r merging, let's say a mysql table and a csv file (schema for both being rhe same)
as 1 joined file in s3 ...

AnandKumar-dcbf
Автор

Can u make a video on the similar problem where we need to get data from on premise database with the help of crawlers and jdbc connection connecting on Prem and later running the job

gunnayyasetti