AWS Tutorials - When to use Custom CSV Glue Classifier?

preview_player
Показать описание

AWS Glue uses classifiers to catalog the data. There are out of box classifiers available for XML, JSON, CSV, ORC, Parquet and Avro formats. But sometimes, the classifier is not able to catalog the data due to complex structure or hierarchy. In such cases, the custom classifiers are configured and used with the crawler.In this tutorial, you learn using custom CSV classifier for some specific use cases.
Рекомендации по теме
Комментарии
Автор

I'm looking for the most code-light (a short Python Lambda function is ok and assumed) way to set up a process so when a CSV file is dropped into my S3 bucket/incoming folder, the file will automatically be validated using a DQ Ruleset I would manually build earlier in console. For any given Lambda call (I assume triggered by a file dropped into our S3 bucket) If possible, I'd like the Lambda to instruct the DQ Ruleset to run but not wait for it to finish (Step function?). Wanting to output a log file of which rows/columns failed to my S3 bucket/reports folder (Using some kind of trigger that fires from a DQ Ruleset finishing execution?). Again, it is important that the process be fully automated because hundreds of files per day with hundreds of thousands of rows will be dropped into our S3 bucket/incoming folder every day via a different automated process. I realize I may be asking a lot, so please feel free to only share the best high level path of which AWS services to use in which order. Thank you!

scotter
Автор

I did according to the video, but I am still facing the same issue. What should I do now?

the_cmmn_man
Автор

is timestamp data type automatically changes in string?
this is happening to me

HammadKhan-mu