pwc question and answer in pyspark | top interview questions and answer in pyspark | #interview

preview_player
Показать описание
In this video we will be solving very interesting problem which was asked in the "PwC interview" in PySpark.

"pwc question and answer" in PySpark
"PySpark question and answer"

Create DataFrame Code :
====================
_data = [(101,'02-01-2024','N'),
(101,'03-01-2024','Y'),
(101,'04-01-2024','N'),
(101,'07-01-2024','Y'),
(102,'01-01-2024','N'),
(102,'02-01-2024','Y'),
(102,'03-01-2024','Y'),
(102,'04-01-2024','N'),
(102,'05-01-2024','Y'),
(102,'06-01-2024','Y'),
(102,'07-01-2024','Y'),
(103,'01-01-2024','N'),
(103,'04-01-2024','N'),
(103,'05-01-2024','Y'),
(103,'06-01-2024','Y'),
(103,'07-01-2024','N')
]
_schema = ["emp_id" , "log_date" , "flag"]

# creating the dataframe

#pwc #pyspark #sql #interview #dataengineers #dataanalytics #datascience #StrataScratch #Facebook #data #dataengineeringinterview #codechallenge #datascientist #pyspark #CodingInterview
#dsafordataguy #dewithdhairy #DEwithDhairy #dhiarjgupta #leetcode #topinterviewquestion
Рекомендации по теме
Комментарии
Автор

Note : I had ran the code before recording the video
That's why we did not get any error while using window, row_number() 😂.

DEwithDhairy
Автор

Excellent, hope your channel grows fast coz most people will not do both python and sql ❤❤

gazart
Автор

Why dint you use lag function instead, it would be straight forward

sonalisuchismitakap
Автор

Hi Dhiraj, great solution as always. Just want to add one test- what if there is month change in between - user_id 101 is coming on 31stJan, 1stFeb, 2ndFeb -- in that case day-rn logic wont work. Is there any solution to generate consecutive dates between min and max dates partition by emp_id so that then we can substract that date column with this date and we will get an output? Thanks

jjayeshpawar