92. Databricks | Pyspark | Interview Question | Performance Optimization: Select vs WithColumn

preview_player
Показать описание
Azure Databricks Learning: Interview Question | Performance Optimization: Select vs WithColumn
================================================================================

What is the difference between pyspark functions select and withcolumn?

Select and withcolumn both are used to add new columns to existing dataframe. But select outperforms withcolumn. The reason behind this difference is explained in this video.

To get through understanding of this concept, please watch this video

#DatabricksSelect,#DatabricksWithColumn, #PysparkSelectvsWithColumn, #PysparkSelect, #PysparkWithColumn, #SparkSelect, #SparkWithColumn ,#PysparkTips, #DatabricksRealtime, #SparkRealTime, #DatabricksInterviewQuestion, #DatabricksInterview, #SparkInterviewQuestion, #SparkInterview, #PysparkInterviewQuestion, #PysparkInterview, #BigdataInterviewQuestion, #BigdataInterviewQuestion, #BigDataInterview, #PysparkPerformanceTuning, #PysparkPerformanceOptimization, #PysparkPerformance, #PysparkOptimization, #PysparkTuning, #DatabricksTutorial, #AzureDatabricks, #Databricks, #Pyspark, #Spark, #AzureDatabricks, #AzureADF, #Databricks, #LearnPyspark, #LearnDataBRicks, #DataBricksTutorial, #azuredatabricks, #notebook, #Databricksforbeginners
Рекомендации по теме
Комментарии
Автор

This one is bit confusing. But, will look into it again. Thanks...N

sumanthb
Автор

Hi. I have question? how to read and write multiple csv files in delta format. and how to show the in Delta format

sivab
Автор

Bro, what is the difference in the method of execution between select and with_columns leading to the performance difference.

saysayeed
Автор

How to add 2 data frames Columns when there is no matching id (1 DF having 3 columns and 2nd DF having 2 columns- i need to add these 5 columns in to a new DF and there is no matching column)

vjayaprathap
Автор

Hi, can we implement data lineage with open lineage in databricks, if we can could you please make a demo of it

NaveenKumar-kbfm
Автор

rdd=sc.parallelize([1, 2, 4, 5])
rdd.collect()
Out[12]: [1, 2, 4, 5]

rdd=rdd.map(lambda x:x*10)
rdd.collect()

Out[13]: [10, 20, 40, 50]

Over here, I can able change the existing RDD right. How do we say RDD/DF are immutable.

Can you please explain🙂?

aravind