61. Databricks | Pyspark | Delta Lake : Slowly Changing Dimension (SCD Type2)

preview_player
Показать описание
Azure Databricks Learning:
==================

How to handle Slowly Changing Dimension Type2 (SCD Type2) requirement in Databricks using Pyspark?

This video covers end to end development steps of SCD Type 2 using Pyspark in Databricks environment

#DatabricksSCDType2 #SCDType2, #SparkSCDType2,#PySparkSCDType2,#SlowlyChangingDimenson2, #DatabricksSlowlyChangingDimension2 #DatabricksPerformanceOptimization, #DatabricksScenarioBasedInterviewQuestion, #SparkScenarioBasedInterviewQuestion, #DatabricksReadCsvInterviewQuestion, #SparkJobs, #NumberofSparkJobs, #DatabricksSparkJobs,#DatabricksRealtime, #SparkRealTime, #DatabricksInterviewQuestion, #DatabricksInterview, #SparkInterviewQuestion, #SparkInterview, #PysparkInterviewQuestion, #PysparkInterview, #BigdataInterviewQuestion, #BigdataInterviewQuestion #BigDataInterview #PysparkPerformanceTuning #PysparkPerformanceOptimization #PysparkPerformance #PysparkOptimization #PysparkTuning #DatabricksTutorial, #AzureDatabricks #Databricks #Pyspark #Spark #AzureDatabricks #AzureADF #Databricks #LearnPyspark #LearnDataBRicks #DataBricksTutorial #azuredatabricks #notebook #Databricksforbeginners
Рекомендации по теме
Комментарии
Автор

Wonderful 🙌
I have got a similar use case at work
Will be using this approach
Thanks!

vaibhavvalandikar
Автор

Hi Raja,
Nice explanation. Not only this topic, you covered each and every topic in very much in detail.
could you please share notebook for the above one.

chappasiva
Автор

Well Explained Raja ! Appreciate your hard work Bhai !!!!

ranjansrivastava
Автор

Thank you for your video. Nice explanation. I found below gap in the solution.
In SCD Type-2, at any point of time, Source record might stop coming due to various reasons. At that time, the corresponding record in the target should be kept the status as "deleted" or "not active". So, if we do "left outer join" we won't be able to identify which record in target need to be in "deleted" status. it should be full outer join and need to alter code accordingly to make the target reflect the correct data.

nareshvemula
Автор

i checked so many documents, so many articles. None of them explained the concept, and somehow copied the same what databricks told.
This is the only video which explains it well. and Step 5 is the most important and the merge_key there is the trick

arryanderson
Автор

This is Epic, Sir... This is very Ultimate... Thank you, Sir...

gurumoorthysivakolunthu
Автор

Hi sir thank you this amazing videos.
I am doing same thing.
For me I need to track deletion of record as well. Where I can change in current notebook so that I will able to track deletion of records

vaibhavb
Автор

Hi Raja, instead of joins, Can we write 2 merge conditions.. in sql we write stored procedures, it just insert the all the rows and
when matches then update

naganavya
Автор

Great explaination.. Thank you..
Could you also make video on SCD type 1

petermarquesliveLIVE
Автор

Hi Raja, Great tutorial! Can you add a video on Change Data Capture (CDC) for deletes?

purnimasharma
Автор

Hi, could you recommend me a book on SCD using pyspark? I would like to delve deeper. I've done projects based on the Azure documentation and with this presentation made by you, I was very excited.
Thanks for sharing your knowledge. I speak of Brazil and I already admire you.
Thanks.

oiwelder
Автор

can you give documentation link of slowly changing dimension

aishwaryap.s.v.s
Автор

Hi, Do we have a Type 3 SCD video ? "Introduce new columns for updated values."

premsaikarampudi
Автор

Hi Raja,
is it possible to share this notebook please?

ardavanmoin
Автор

do we need to create separate Database in delta lake for dimensions tables. Usually we create database name "RDS" and put all the reporting tables such as dimenation and fact tables as dim_product, dim_customer, dim_date and Sales_Fact. is there any standard we can follow?

Umerkhange
Автор

Hi Mr.Raja i found your tutorial videos very interesting. Would you be able to share the dbc files for the tutorial...it will be helpful to refresh my memory and also prep for any future interviews...i will be happy to subscribe to a paid online course as well if any for the dbc files...

Nrn-biwu
Автор

Hii, when using merge key .. so this works only for table having two primary keys or can we apply this to table with one primary key as merge key ?

indhu
Автор

Hai
when we are running multiple times by updating the same record the history data is growing
when we run 4 times with update we are getting 3 inactive and 1 active record ..can you help out this as in other process history is not growing ...

sujithkumar
Автор

Hi Sir....
As you mentioned this type SCD 2 will. Create duplicate primary keys... What is the solution for this problem, Sir...? Thank you, Sir...

gurumoorthysivakolunthu
Автор

Hi sir, I am doing merge Operation which is working for the first time... When ran that merge statement multiple times it is giving ambiguity error.... Can you please suggest solution for this

niharikakota