day 3 | consecutive days | pyspark scenario based interview questions and answers

preview_player
Показать описание
day 3
consecutive days
pyspark scenario based interview questions and answers

Create DataFrame Code :
====================

data = [
(1, '2024-03-01'),
(1, '2024-03-02'),
(1, '2024-03-03'),
(1, '2024-03-04'),
(1, '2024-03-06'),
(1, '2024-03-10'),
(1, '2024-03-11'),
(1, '2024-03-12'),
(1, '2024-03-13'),
(1, '2024-03-14'),
(1, '2024-03-20'),
(1, '2024-03-25'),
(1, '2024-03-26'),
(1, '2024-03-27'),
(1, '2024-03-28'),
(1, '2024-03-29'),
(1, '2024-03-30'),
(2, '2024-03-01'),
(2, '2024-03-02'),
(2, '2024-03-03'),
(2, '2024-03-04'),
(3, '2024-03-01'),
(3, '2024-03-02'),
(3, '2024-03-03'),
(3, '2024-03-04'),
(3, '2024-03-04'),
(3, '2024-03-04'),
(3, '2024-03-05'),
(4, '2024-03-01'),
(4, '2024-03-02'),
(4, '2024-03-03'),
(4, '2024-03-04'),
(4, '2024-03-04')
]

schema = "user_id int , login_date string"

#interview #spark #pyspark
Рекомендации по теме
Комментарии
Автор

One good thing about this series is that you share the dataframes directly which makes it quite easy to start with coding directly. Quite helpful

punpompur
Автор

Amazing brother 💥 I learned lot from your channel. Keep rocking 💯

karthickraja
Автор

Excellent series sir, please don't stop it

pradumanyadav
Автор

Does it matter if we use row_number or dense_rank? I tried the code with row_number and the output matched the expected result

punpompur