AWS Tutorials - Methods of Building AWS Glue ETL Pipeline

preview_player
Показать описание
AWS Glue Pipelines are responsible to ingest data in the data platform or data lake and manage data transformation lifecycle from raw to cleansed to curated state. There are many methods to build such pipelines. In this video, we talk about some of these methods and compare them for reusability, observability and development effort.
Рекомендации по теме
Комментарии
Автор

Thank you for the pipeline video, very insightful.

santospcs
Автор

this is super good and helpful. thank you

rollinOnCode
Автор

Nice tutorial. Can you make some practical tutorial for event based pipeline?

mabash
Автор

Insightful tutorial. Can you make a practical video based on event based pipeline using Dynamodb to store metadata and configurations with retry mechanism in case if it fails?

adityag
Автор

Hello, good video. Maybe someone knows when use Glue workflows and when use StepFunctions?

DavidChoqueluqueRoman
Автор

Hello! Which of the three methods is more cost effective?

andresmerchan
Автор

awesome tutorial! I have a quetion to ask if dont mind: how shall we deal with upsert/delete in those landing/clean/curated zones? I know databricks has similar archtechture with brozne/silver/gold, but it comes with delta lake. if our destination is Redshift, should we move data into Redshift(RDBMS) in earlier stage, like before curated zone. I also send you email, hope you can help to answer. thanks heaps....

ryany
Автор

Could you please make some practical workshop kind of thing on these approaches?

pachappagarimohanvamsi
Автор

Thank you for the pipeline video, very insightful. Quick question: to avoid hardcoding, can I also use DynamoDB for storing environment parameters like s3 paths / file names / business date for my ETL pipeline let's say using step functions, and what do you think is the best industry practice for storing parameters for AWS ETL pipeline?

timmyzheng
Автор

As always thank you for the video. The breakdown comparison is incredibly intuitive. I am curious about your view on which approach is best in handling pipeline replay (i.e. handling pipeline failure) and CI/CD process (i.e. pipeline as code)?

hsz
Автор

Hi all,

We are using AWS Glue + PySpark to perform ETL to a destination RDS PostgreSql DB. Destination tables have columns with primary & foreign keys with UUID data type. We are failing to populate these destination UUID type columns. How can we achieve this, please suggest.

alokanand
Автор

Hello Sir..I follow all your videos. They are very useful in my project. Thank you very much.I have a quick question: Is there a possibility to add multiple SQL statements in one AWS glue Studio job? if yes can you help me with it.(use case: want to truncate the target table(Snowflake) before loading)

radhasowjanya
Автор

Can we do machine learning algorithm in glue job using coding

SurendraUddagiri
join shbcf.ru