LnT pyspark interview questions and answers | highest and lowest salary employee name | #pyspark

preview_player
Показать описание
LnT pyspark interview questions and answers
highest and lowest salary employee name
how to find highest salary in pyspark
how to find lowest salary in pyspark
window function in pyspark

Create DataFrame Code :
=====================
emp_data = [
('Siva',1,30000),
('Ravi',2,40000),
('Prasad',1,50000),
('Arun',1,30000),
('Sai',2,20000)
]

emp_schema = "emp_name string , dep_id int , salary long"

# Creating the dataframe
display(df)

top interview question and answer in pyspark :

#pyspark #interview #lnt #deloitte #zs #fang #pyspark #sql #interview #dataengineers #dataanalytics #datascience #StrataScratch #Facebook #data #dataengineeringinterview #codechallenge #datascientist #pyspark #CodingInterview
#dsafordataguy #dewithdhairy #DEwithDhairy #dhiarjgupta #leetcode #topinterviewquestion
Рекомендации по теме
Комментарии
Автор

Where ever I said rank() in the explanation its row_number() 😅.

We can Solve this using rank() and dense_rank() also that you need to comment! 🤟

DEwithDhairy
Автор

I have a suggestion, you can actually skip second step and there can be an optimized solution.

luc_i_fer
Автор

Is it necessary to solve the question using pyspark. Can't we solve using sql. By creating createOrReplacetempview . If spark sql is supported why is it necessary to remember those spark functions. Can you please tell me. I like the approach you take to solve especially sql questions. Please solve this question also in Sql.

chandanpatra
Автор

my solution:
df1 = df.select("*", row_number().over(Window.partitionBy(col("dep_id")).orderBy(col("salary"))).alias('rn'),
df1.groupBy("dep_id").agg(max(when(col("rn")==1, col("emp_name"))).alias("max_salary"), max(when(col("rn")==col('count'),

satyasaivarunhanumanthu