Step-by-Step Guide to Incrementally Pulling Data from JDBC with Python and PySpark

preview_player
Показать описание
Attention data professionals! 🚨 Are you tired of waiting for hours to extract large datasets? ⏰ Our upcoming video has got you covered! 🎥 Join us for a step-by-step guide to incrementally pulling data from JDBC sources using Python and PySpark. 💻 In the video, we'll demonstrate one of the coolest techniques for incrementally pulling data from tables with an Auto Increment Primary Key. You'll learn how to extract only the data you need, saving you time and headaches. Don't miss out on this valuable resource for streamlining your data extraction process! 🔥 Drop a comment below and let us know what other data extraction topics you're interested in learning about! 💬 Stay tuned for the video release. 😉"

Article with step by step details

Code can be found
Рекомендации по теме
Комментарии
Автор

Really nice and thank you for your time and effort. I do have a question though. What if I update an already existing record and include it in the incremental or Delta load. Obviously we need to take care of the CDC when we work with DELTA loads. Any idea / suggestions from your end... Just curious bro.

karunakaranr
Автор

In my data integration projects, the delta files always comes with updates and new records. That's why I am asking this It's a real time scenario which I encounter during batch processing. { I was using MERGE SQL statements to either update / insert conditionally.)

karunakaranr
Автор

awesome!! would this work on something like redshift or dynamodb?

henryomarm