Data Analyst Spotify Case Study | SQL Interview Questions

preview_player
Показать описание
In this video we will discuss a Spotify case study. This case study has 5 questions and with each question difficulty level will go up.

Video Content:
00:00 Understand the problem statement
01:52 Daily active users
03:31 Weekly active users
04:39 same day install and purchase
09:10 country wise paid users
15:11 Last question and assignment

Most Asked Join Based Interview Question:

Top 10 SQL interview Questions:

Interview Question based on FULL OUTER JOIN:

Playlist to master SQL :

Rand, Dense_Rank and Row_Number:

Script to create and insert data:
CREATE table activity
(
user_id varchar(20),
event_name varchar(20),
event_date date,
country varchar(20)
);
delete from activity;
insert into activity values (1,'app-installed','2022-01-01','India')
,(1,'app-purchase','2022-01-02','India')
,(2,'app-installed','2022-01-01','USA')
,(3,'app-installed','2022-01-01','USA')
,(3,'app-purchase','2022-01-03','USA')
,(4,'app-installed','2022-01-03','India')
,(4,'app-purchase','2022-01-03','India')
,(5,'app-installed','2022-01-03','SL')
,(5,'app-purchase','2022-01-03','SL')
,(6,'app-installed','2022-01-04','Pakistan')
,(6,'app-purchase','2022-01-04','Pakistan');

#sql #dataanalyst #spotify #casestudy
Рекомендации по теме
Комментарии
Автор

Great Job Bro...keep it up..My Approach for Last Question.

with prev_data as
(select *,
lag(event_date, 1) over(partition by user_id order by event_date) as prev_event_date,
lag(event_name, 1) over(partition by user_id order by event_date) as prev_event_name
from activity)

select event_date,
count(case when event_name='app-purchase' and and datediff(day, prev_event_date, event_date)=1 then user_id else null end) as user_cnt
from prev_data
group by event_date

sateeshkumard
Автор

Great Video ...Last question using self join

select event_date, count(user_id1) as count from (
select a.*,
b.event_date as event_date1,
b.user_id as user_id1
from activity a left join activity b
on a.user_id=b.user_id and DATEDIFF(day, b.event_date, a.event_date)=1)a
group by event_date

DeepanshuPal-fu
Автор

Hi Sir,
I 'm gaining Confident after solving problems posted in your channel..Thank you for all your videos..Keep posting and help us learn SQL.
Hereby sharing the end query Q5.
with prev_data as (
select *,
lag(event_name, 1) over (partition by user_id order by event_date) as prev_event_name,
lag(event_date, 1) over (partition by user_id order by event_date) as prev_event_date
from activity)

select event_date,
sum(case when event_name='app-purchase' and and datediff(day, prev_event_date, event_date) =1
then user_id else 0 end) as cnt_users
from prev_data
group by event_date;

swethathiruppathy
Автор

Nice questions sir
Last question solution
with cte as (select user_id, event_name, event_date, lag(event_date) over(partition by user_id order by event_date) as pur_date, lag(event_name)
over(partition by user_id order by event_date) as pur_event from activity)

select event_date, count( distinct case when event_name="app-purchase" and pur_event="app-installed" and datediff(event_date, pur_date)=1 then user_id else null end)
as cnt_users from cte group by event_date;

hairavyadav
Автор

Great video sir ..last question i made using right join

with cte as(
select *, lag(event_date, 1) over(partition by user_id order by event_date) as new_date
from activity), cte1 as (
select * from cte a where datediff(day, a.new_date, event_date)=1)
select a.event_date, count(distinct c.user_id) as no_of_users
from cte1 c
right join activity a on a.event_date=c.event_date
group by a.event_date

chenchuladileep
Автор

Thanks for wonderful questions
My approach for last question:
Select Z.event_date, Count(c.user_id) as Users_Count from activity as Z left join(

Select b.event_date, B.user_id from activity as A
join activity as B on
A.user_id=B.user_id
where
(A.event_name='app-installed' and B.event_name='app-purchase')
and B.event_date=DATEADD(Day, 1, A.event_date))C
on Z.event_date=C.event_date
group by Z.event_date

mrshubhamthakur
Автор

your videos are so short and good. We thank you for your help. I will study of your examples, I learnt a lot from them. Thanks

dfkgjdflkg
Автор

Nice questions.. My approach for last question
select
a1.event_date, COUNT(a2.user_id) as cnt_users
from activity a1
left join activity a2
on a1.user_id = a2.user_id and DATEDIFF(DAY, a2.event_date, a1.event_date) = 1
group by a1.event_date

simplyVeer
Автор

My approach for the last question:

with history as
(select *,
lag(event_name, 1) over(partition by user_id order by event_date) as prev_event,
lag(event_date, 1) over(partition by user_id order by event_date) as prev_date
from activity)
select event_date,
count(distinct case when event_name = 'app-purchase' and prev_event = 'app-installed' and DATEDIFF(event_date, prev_date) = 1 then user_id else null end) as user_count
from history
group by 1;

ChittaTarunSaiBalaGopal
Автор

Fantastic Ankit. Thank you so much for this video. I am currently preparing for my DA interview and this is proving to be very helpful :)

divyanshisharma
Автор

Great video sir, my try for 3rd question
with cte as (
select *, lead(event_date, 1) over (partition by user_id order by event_date desc) as lead_date
from activity), cte2 as (
select * from cte where lead_date is not null and event_date=lead_date)
select a.event_date, count(distinct c.user_id) as no_of_users
from cte2 c
right join activity a on a.event_date=c.event_date
group by a.event_date

chenchuladileep
Автор

For part4:

with cte as
(
select user_id, min(event_date) as mindate, max(event_date) as maxdate, case when
DATEDIFF(day, min(event_date), max(event_date)) = 1 then 1 else 0 end as new
from activity
group by user_id
)
select distinct maxdate, new as 'nextdaybye' from cte

arthurmorgan
Автор

Hi Ankit, thanks for the video.
my solution for the last part in postgreSQL:

WITH CTE AS (SELECT EVENT_DATE,
CASE WHEN EVENT_NAME = 'APP-PURCHASE'
AND LAG(EVENT_NAME) OVER(PARTITION BY USER_ID ORDER BY EVENT_DATE) = 'APP-INSTALLED'
AND LAG(EVENT_DATE) OVER(PARTITION BY USER_ID ORDER BY EVENT_DATE)+1 = EVENT_DATE THEN 1 ELSE 0 END AS CNT_USERS
FROM ACTIVITY)

SELECT EVENT_DATE, CNT_USERS
FROM CTE
GROUP BY 1, 2
ORDER BY 1

prajwalns
Автор

Question 5 Soln
with cte as
(
select user_id, event_name, event_date,
lag(event_date, 1) over(partition by user_id order by event_date) as prev_date,
lag(event_name, 1) over(partition by user_id order by event_date) as prev_event
from activity1
)

select event_date,
sum(case when event_date=prev_date+1 and cte.event_name='app-purchase' and
then 1
else 0 end ) as cnt
from cte
group by event_date

ashpoghosh
Автор

Great learning, here is my solution for 5th or last one using JOIN
with set1 as
(
select user_id, dateadd(day, 1, event_date) as edt, 'app-purchase' as en
from activity where event_name='app-installed'
)
select t.event_date,
sum(iif(jt1.user_id is null, 0, 1)) as userCnt
from activity as t
left join set1 as jt1 on jt1.user_id=t.user_id and jt1.edt=t.event_date and jt1.en=t.event_name
group by t.event_date

LearnYouAndMe
Автор

Great job brother....few days ago..one of your video was on my feed....it was good later I went and saw most of your videos..they are really great...u come up with unique content...most does only SQL tutorial..u come up with different use cases...keep it up brother
..keep going...u are really doing great help

dileep
Автор

Thank You for sharing this case study!

I completed the 5th question as suggested using PostgreSQL:

select event_date,
count(case when event_name = 'app-purchase'
and prev_name = 'app-installed'
and extract(day from event_date::timestamp - prev_date::timestamp) = 1
then event_date else null end ) as cnt_users
from ( select *,
lag(event_name, 1) over(partition by user_id order by event_date) as prev_name,
lag(event_date, 1) over(partition by user_id order by event_date) as prev_date
from activity ) temp
group by event_date
order by event_date

Please check and let me know if it looks good. Also, if possible please do share Data Engineer Interview Python Questions. Will truly appreciate it!

ayushipatra
Автор

Thanks for the SQL Problem Discussion, Ankit!

aditya-k
Автор

Thanks Ankit for such insightful content .
/* Question 5 :- Among all users who installed the app on given day, how many did app purchased very next day */

Select count(a2.user_id) as tot_cnt
, case when a1.event_date = dateadd(day, -1, a2.event_date) then a2.event_date else a1.event_date end as eventdate
from activity a1 left outer join activity a2
on a1.user_id=a2.user_id and a1.event_name='app-installed' and a2.event_name='app-purchase'
and a1.event_date = dateadd(day, -1, a2.event_date)
group by case when a1.event_date = dateadd(day, -1, a2.event_date) then a2.event_date else a1.event_date end

---- approach 2, with single table

with cte1 as
(select *
, lag(event_name,1) over ( partition by user_id order by event_date) as prev_event_name
, lag(event_date,1) over ( partition by user_id order by event_date) as prev_event_date
from activity)
select sum(case when event_date = dateadd(day, 1, prev_event_date) then 1 else 0 end) as total_cnt, event_date
from cte1
group by event_date

shekharagarwal
Автор

for oracle 5 th answer

select a.event_date,
sum(case when a.event_name='app-purchase' and b.event_name='app-installed' then 1 else 0 end) as cv
from activity a left join activity b
on a.user_id=b.user_id and a.event_date=b.event_date+1
group by a.event_date order by a.event_date;

opiviv
visit shbcf.ru