How to Do Incremental Data Loading and Data Validation with PySpark and Spark! Spark Basics!

preview_player
Показать описание
In this video, I'll be showing you how you can perform an incremental data loading job with PySpark, and then validate the uploaded data is of the correct shape and size!

Рекомендации по теме
Комментарии
Автор

This is awesome info!
Thanks for the video!

not_saboor
Автор

I would recommend using a hash key to make sure rerun doesn’t insert dupes into the table.

thebookshelfreviewer-kjmx
Автор

So in the snowflake table, there will be rows with the same id, but with 'last_updated' column different?

so I still need to make another query to return the most recent row for every id?

Levy
welcome to shbcf.ru