Question 11: PWC Interview Questions part 2| data engineers | #pyspark #bigdata #pwc #interview

preview_player
Показать описание
In this video I have discussed on Interview question asked in PWC interview for data engineers.

Q : Suppose you have a dataset with information about employee projects, and you want to find the most recent project and the total number of projects for each employee.

employee_projects_data = [
(1, 'Project1', '2022-01-10'),
(1, 'Project2', '2022-02-15'),
(1, 'Project3', '2022-03-20'),
(2, 'Project1', '2022-01-05'),
(2, 'Project2', '2022-02-10'),
(2, 'Project3', '2022-03-15'),
(2, 'Project4', '2022-04-20')
]

schema = "employee_id int ,project_name string, project_date string"

Solution is in PySpark

Check out this video and do let me know your doubts we can connect on

Do subscribe @pysparkpulse for more such Questions.

#pyspark #spark #bigdata #bigdataengineer #dataengineering #dataengineer #deloitte #pwc #mnc
Рекомендации по теме
Комментарии
Автор

We can also order by date and then filter the latest date.

bolisettisaisatwik
Автор

what is date data is not in order then we have to use rank and then filter right?

siddharthchoudhary
join shbcf.ru