Most Important Question of PySpark in Deutsche Bank Interview Question | PySpark Join |

Показать описание

data=[(1,5),(2,6),(3,5),(3,6),(1,6)]
schema="customer_id int,product_key int"

data=[(5,),(6,)]
schema="product_key int"

Databricks-PySpark RealTime Scenarios Interview Question Series

Project Link:

#hashtags
#tags #pysparkinterview #pysparkforbeginners

Рекомендации по теме

Комментарии

Was searching for the true purpose of countDistinct in a similar case... thanks Sagar

_Sujoy_Das

What if we have 5, 6, 7, 8 in product table and customer table is same as mentioned?? i dont think count distinct will work in this case

df = customer_df.withColumn('flag', when(col('product_key').isin(product_df.select('product_key').rdd.flatMap(lambda x : x).collect()), 1).otherwise(0)).distinct()

==

This should work in most of the cases.. Thanks for bringing such questions to us Sagar

chetanphalak

I remember there was a concept of where exist with correlated query, with works like (for all) expression. I will try solving that as well

ashishagarwal

Hey Sagar
Can you give discount to buy your course

devamurugansankaran

Bro from where do u collect such questions. Please share the resources to practice more sucu questions

surajpatil

Hey @sagar
Thank you for posting such questions..
Can you please recheck once if this works for all related scenarios or only for these particular dataframes which you are using in this example?

I tried and found out that simple inner join can get us the required result without the use of countdistinct function and also the solution which you shared above is not working for all scenarios. Like I tried tweaking your dataframes which different values like 7, 8, 9, 10, 11 as product key in customer_df and kept product_key in product_df as same 5, 6. And the logic fails there.

I might be missing something. Please do correct me out in case I am ignoring anything important.

My solution which works for all possible scenarios:
Final_df=customer_df.join(product_df, on=‘product_key’,

saurabh

Hi Sir My Solution

df =
df.filter(col('count')>= product_df.count()).show()

rawat

Most Important Question of PySpark in Deutsche Bank Interview Question | PySpark Join |

Tiger Analytics PySpark Interview Question | Very Important Question of PySpark |

Most Important Question of PySpark in Deutsche Bank Interview Question | PySpark Join |

Most Important Question of PySpark in LTIMindTree Interview Question | Salary in each department |

Most Important Question of PySpark in Tech Tech Interview Question #pysparkinterview #interview

Top 50 PySpark Interview Questions & Answers 2024 | PySpark Interview Questions | MindMajix

Trending Big Data Interview Question - Number of Partitions in your Spark Dataframe

109. Databricks | Pyspark| Coding Interview Question: Pyspark and Spark SQL

Apache Spark Interview Questions And Answers | Apache Spark Interview Questions 2020 | Simplilearn

Live-Coding: Getting Started with PySpark Structured Streaming | Spark, Python

Pyspark Advanced interview questions part 1 #Databricks #PysparkInterviewQuestions #DeltaLake

10 frequently asked questions on spark | Spark FAQ | 10 things to know about Spark

Most Asked Coding Interview Question (Don't Skip !!😮) #shorts

Cache and Persist DataFrame PySpark Interview Question | Maersk Interview Question |

Most important pyspark data engineer interview questions 2024

Data validation between source and target table | PySpark Interview Question |

Solve Globant PySpark Interview Question | Apache Spark |

PySpark Interview Questions | Big Data Interview | #spark #interview #pyspark

102. Databricks | Pyspark |Performance Optimization: Spark/Databricks Interview Question Series - II

PySpark Interview questions | Part 9 | #shorts #pyspark #bigdata

PySpark Interview questions | Part 12A | #shorts #pyspark #bigdata

Shradha didi at lpu 🤩 #apna college #viralshorts

Data engineer interview question | Process 100 GB of data in Spark Spark | Number of Executors

Spark performance optimization Part1 | How to do performance optimization in spark

101. Databricks | Pyspark |Core/Architecture: Spark/Databricks Interview Question Series - I