76. Databricks|Pyspark:Interview Question|Scenario Based|Max Over () Get Max value of Duplicate Data

preview_player
Показать описание
Azure Databricks Learning: Interview Question - Max Over()
==================================================

Real time Scenario based question: How to get only maximum value of each column among Duplicate records?

Using max over window function, the maximum value of each partitioned records can be retrieved. So max over function can be applied to handle this scenario. I have explained all steps in detail in this video

#DatabricksWindow,#PysparkWindow, #SparkWindow,#pysparkMaxOver,#DatabricksScenarioBased,#PysparkDropDuplicates #DatabricksRealtime, #SparkRealTime, #DatabricksInterviewQuestion, #DatabricksInterview, #SparkInterviewQuestion, #SparkInterview, #PysparkInterviewQuestion, #PysparkInterview, #BigdataInterviewQuestion, #BigdataInterviewQuestion, #BigDataInterview, #PysparkPerformanceTuning, #PysparkPerformanceOptimization, #PysparkPerformance, #PysparkOptimization, #PysparkTuning, #DatabricksTutorial, #AzureDatabricks, #Databricks, #Pyspark, #Spark, #AzureDatabricks, #AzureADF, #Databricks, #LearnPyspark, #LearnDataBRicks, #DataBricksTutorial, #azuredatabricks, #notebook, #Databricksforbeginners
Рекомендации по теме
Комментарии
Автор

Thanks Raja . Please add more scenario based questions . Would be really helpful .

venkatasai
Автор

I used groupBy, then aggregated Max of all values i wanted.
Is my method faster or window function method

shrek
Автор

It would be great if you could provide the code repository for all these questions

amanpathak
Автор

Is there any way to TOP rank the product with Highest Price using python currently it just does with lowest price.

df = df.select(["Product_id", "Product_Name", "Price", "DiscountPercent"])
windowsSpec =
df = df.withColumn("Rank", F.rank().over(windowsSpec))

Umerkhange