AWS Tutorials - Data Quality Check in AWS Glue ETL Pipeline

preview_player
Показать описание


Maintaining data quality is very important for the data platform. Bad data can break ETL jobs. It can crash dashboards and reports. It can hit accuracy of the machine learning models due to bias and error. Learn how to configure data quality check in AWS Glue ETL Pipeline.
Рекомендации по теме
Комментарии
Автор

I really love your video & they help me a lot on learning Glue, Amazing works. Thanks a lot for that. If would be great if you can have video about building CDK and how to have IAC/CICD for glue pipeline that we can deploy to different environments. Looking forwards to hearing from you soon.

yenbui
Автор

Great demo and example of this type of integration

prannoyroy
Автор

You are doing a wonderful job!! Its extremely informative (y)

arunasingh
Автор

That was extremely helpful, thank you!

MahmoudAtef
Автор

That's something really more useful content. Thanks a lot
have you ever worked on pydeequ and can you make a video on it?

manojt
Автор

Hi, I've noticed that you use the catalogued s3 bucket as the target in the glue job, instead of the actual bucket. Are there any advantages of doing that?

nlopedebarrios
Автор

How can I take ruleset defined in DynamoDb items and add to data quality job?

jhigslh
Автор

Can you please provide script of above feature implementation?

jhigslh
Автор

Thanks for such an informative session for this Glue pipeline.
But is it possible for you to put the steps like you did for previous videos on your Aws-dojo website(minus code). As this is really helpful to check steps been followed by us against your steps, in case of any error we are facing.

terrcan
Автор

Thank you so much!. I am trying to build a data quality framework for all our etl pipelines(Batch and real time). Can we hold the rules for dfferent etls in a data store(Dynamodb, s3, etc) and then call those rules based on the pipeline. I thought to use deequ until I came across this video which seems much easier option than handling it in a library as long as it provides most apis as deequ. Kindly advice.

jeety
Автор

Hi, great video, one question, how to pass the params from the StartProfileJob to CheckDQOutput, to read the jobname, filename etc in the lambda function?, tkx

williamlatorre
Автор

Shall we use same scenario with glue work flow instead of using step function?

veerachegu
Автор

It would be great if you can share slides with us

hamzakazmi