AWS Tutorials – Building Event Based AWS Glue ETL Pipeline

preview_player
Показать описание

AWS Glue Pipelines are responsible to ingest data in the data platform or data lake and manage data transformation lifecycle from raw to cleansed to curated state. There are many methods to build such pipelines. In this video, you learn how to build event based ETL pipeline.
Рекомендации по теме
Комментарии
Автор

You've explained well about the execution flow, but you've not explained the creation of Glue Database, Catalog tables creation, Dynamo DB table, Lambda function, Event Bridge creation. You've created backend and just explaining the flow again. Please explain the creation part as well.

fnzswqc
Автор

Good to see this demo and Pls do demo for incremental data upload in to S3 bucket

veerachegu
Автор

Hi Sir, would you claifyone query. I hae this doubt while you are explaining the data pipeline at 3:20. Why we are using data catalog here

suneelkumar-knds
Автор

Hello. Thanks for the tutorial. I have small clarification. So Basically every Glue Job and Glue Crawler by default writes an event to default bus of EventBridge and then based on rule filtering we are invoking the Lambda. Correct? Because I don't see code/any configuration done in job or crawler to publish an event into Eventbridge. Please confirm my understanding.

coldstone
Автор

Great work. You got my sub, you deserved it. Highly appreciate your work.
Could you do a Workshop Excercise for setting up such a pipeline?
Can you also do a tutorial/workshop in setting up glue job pipelines with cloudformation?
Thanks and best regards

DanielWeikert
Автор

Hello. Thanks a lot for this video. It is really helpful. I have one question here to run your second glue job how we will know that all our files are copied to S3 ?

poojakarthik
Автор

Thank you for making useful videos on AWS. I learnt a lot of knowledge by watching your videos. I have a use case where I need your inputs. A job that writes multiple parquet files (usually a single dataset splitted to multiple files due to spark partitions) to an S3 bucket. I wanted create an event to eventbridge when all files are written successfully. How do I implement this using S3 and eventbridge. Currently I see multiple events are getting triggered.

ballusaikumar
Автор

Can someone pls explain the below code which is written in the lambda script :

target =
targettype =

What should be the expected output of the above lines !!

aniket
Автор

Thank you for the Tutorials.
I have a question on deployment, after developing this pipeline(Glue, crawler, Lambda, and event bridge) in the Development environment how to move /deploy all this code in Production

spp
Автор

Thanks, very helpful tutorial. Please continue your good work. Sir can you cover how to create monitoring or observability dashboard for such pipeline using cloudwatch logs

hirendra
Автор

Thank you very much for your excellent work with this channel. If I have multiple Glue Jobs but I want to publish to Event Bridge only for some Glue jobs, How do I handle it in Event Pattern? If I am not wrong, with this Event pattern all the Glue jobs completion will trigger the lambda, correct? Can we use some tokens in event pattern? Eg: Glue job name starts with GJ_% etc.? Thanks in advance.

arunt
Автор

Can we use S3 instead of using Dynamo DB to Lambda execution data

veerachegu
Автор

Nice video but will like to know if you have a code that can be embedded in the glue job script to prevent duplicate data if the jobs runs every hour.
and I know bookmark will help but am looking it u have a code that can be included in the script section.

canye
Автор

When I triggered a glue workflow with lambda to write csv to another folder as parquet I received this error
Did not found any help on google. Any ideas?

DanielWeikert
Автор

Demo part is not good. things are not properly explained.
Just reading, not shown up to how to create them up.
please focus on practical part instead of theory.

abhijeetjain