How to Do Incremental Data Loading and Data Validation with PySpark and Spark! Spark Basics!

preview_player

Показать описание

In this video, I'll be showing you how you can perform an incremental data loading job with PySpark, and then validate the uploaded data is of the correct shape and size!

The Data Guy

Рекомендации по теме

Комментарии

This is awesome info!
Thanks for the video!

not_saboor

I would recommend using a hash key to make sure rerun doesn’t insert dupes into the table.

thebookshelfreviewer-kjmx

So in the snowflake table, there will be rows with the same id, but with 'last_updated' column different?

so I still need to make another query to return the most recent row for every id?

Levy

welcome to shbcf.ru