Efficient Data Ingestion with Glue Concurrency and Hudi Data Lake

Показать описание

Efficient Data Ingestion with Glue Concurrency: Using a Single Template for Multiple S3 Tables into a Transactional Hudi Data Lake
Efficient Data Ingestion with Glue Concurrency: Using a Single Template for Multiple S3 Tables into a Tra Managing a data lake with multiple tables can be challenging, especially when it comes to writing ETL or Glue jobs for each table. That's why I'm excited to share my upcoming video on the templated approach for managing ETL jobs in a data lake. By creating a single job that can be used for multiple tables, you can save time and reduce the amount of infrastructure code needed to manage your data lake. Join me to learn more about how you can templatize your code and run the same job for multiple tables, making it easier to scale and manage your data lake

Code and Step by step guide

Article:

Soumil Shah

Рекомендации по теме

Комментарии

Love it. We did something similar in our org. use dynamo db for storing parameters. it also used sqs and stepfunctions.

ghostvideouploader

Hi Soumil, Your content has really helped me to progress with Apache Hudi. Thanks for sharing it out!
I wanted to ask, how can I leverage Apache Hudi CLI properties when I am using AWS Glue (not AWS EMR)

the_mindful_mitra

Can u make a video to introduce how to ingest CDC of all tables under one database (say MySQL) into hudi data lake in one job? Supporting sync dynamically newly added tables and backward compatibility schema evolution of existing tables.

陈帅-xj

Any possible to do for both customer and orders as one table

unagarjuna

Can u make 1 video where u r merging, let's say a mysql table and a csv file (schema for both being rhe same)
as 1 joined file in s3 ...

AnandKumar-dcbf

Can u make a video on the similar problem where we need to get data from on premise database with the help of crawlers and jdbc connection connecting on Prem and later running the job

gunnayyasetti

Efficient Data Ingestion with Glue Concurrency and Hudi Data Lake

Efficient Data Ingestion with Glue Concurrency and Hudi Data Lake

Data Ingestion : Ingesting Data into S3 via Glue Jobs and Loading into Redshift Database

Data Pipeline Overview

Back to Basics: Building an Efficient Data Lake

AWS re:Invent 2024 - Solving different data ingestion use cases with AWS (ANT330)

AWS Tutorials - Data Ingestion Services in AWS

AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]

Data Pipelines with AWS Glue (Level 200)

AWS Tutorials - Incremental Data Load from JDBC using AWS Glue Jobs

AWS Cloud Quest Solutions Architect Quest 08 | Data Ingestion Methods (AWS Lambda/Glue Tutorial)

Getting Started with AWS Glue Data Catalog

What is Data Pipeline? | Why Is It So Popular?

What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline (2024)

TRANSFORM ANALYTICS EVENTS WITH AWS GLUE and discover data automatically with AWS Glue Crawlers

AWS Data Engineer Project | AWS Glue | ETL in AWS

AWS re:Invent 2022 - Simplify & accelerate data integration & ETL modernization w/AWS Glue (...

10 ETL Design Patterns (Data Architecture | Data Warehouse)

Tracking Processed Data Using AWS Glue Job Bookmarks | Incremental ETL In-depth intuition

Fast, Cheap and Easy Data Ingestion with AWS Lambda and Delta Lake

Scale Your Data Ingestion With an Ingestion Framework

Designing Data Lakes: Best Practices (Level 200)

Effective Data Lake Ingestion

Data Ingestion Life Cycle | Data Ingestion | Data Engineering

AWS Glue Beginners Tutorial | Learn AWS Data Pipeline Design