Customer Retention and Churn Analysis (Part 1/2) | SQL Interview Question Product Based Companies

preview_player
Показать описание
Customer retention refers to the ability of a company or product to retain its customers over some specified period. High customer retention means customers of the product or business tend to return to, continue to buy or in some other way not defect to another product or business, or to non-use entirely.
Company programs to retain customers: Zomato Pro , Cashbacks, Reward Programs etc.
Once these programs in place we need to build metrics to check if programs are working or not. That is where we will write SQL to drive customer retention count.
If you preparing for SQL interviews for product based companies check this playlist :

Here is the script to insert data :
create table transactions(
order_id int,
cust_id int,
order_date date,
amount int
);
delete from transactions;
insert into transactions values
(1,1,'2020-01-15',150)
,(2,1,'2020-02-10',150)
,(3,2,'2020-01-16',150)
,(4,2,'2020-02-25',150)
,(5,3,'2020-01-10',150)
,(6,3,'2020-02-20',150)
,(7,4,'2020-01-20',150)
,(8,5,'2020-02-20',150)
;

#sql #interview #customer #retention
Рекомендации по теме
Комментарии
Автор

wow, your logic is mind blowing. The way you dissect the problem statement and step by step approach to make everyone understand why you took a stand on any particular approach is very constructive and logical. Keep rocking @AnkitBansal. I can't say more, that I LOVE YOUR content.

sivasrimakurthi
Автор

hello Ankit, just for my clarification from this video to ask you one question that let say one customer ordered december 21 and january 22 and how do we calculate by datediff month function in that case we never get 1

santanuroy
Автор

Ankit Bhai,
i have created below query -

select month, COUNT (case when month - prev_month = 1 then 1 end ) retention from
(
select cust_id, month(order_date) month, lag(month) over (partition by cust_id order by month ) prev_month from transactions
)
group by month;

This query works even if there are any duplicate data in the table.

OR below query is also giving same result.

select month, sum (diff) from (
select cust_id, month(order_date) month, lag(month) over (partition by cust_id order by month ) prev_month, case when month - prev_month = 1 then 1 else 0 end diff from transactions
)
group by month

in case we need to check for distinct cust_id, then use below one -

select month, count (distinct diff) from (
select cust_id, month(order_date) month, lag(month) over (partition by cust_id order by month ) prev_month, case when month - prev_month = 1 then cust_id end diff from transactions
)
group by month

prashantmhatre
Автор

my approach:
with cte as(
select *, DATEDIFF(month, lag(order_date) over(partition by cust_id order by order_Date), order_date) as diff from transactions)
select month(order_Date) as month, sum(case when diff = 1 then 1 else 0 end) as cnt from cte group by MONTH(order_date)

rahulmehla
Автор

Hi @Ankit,
select date_part('month', order_date), sum(case when date_diff=1 then 1 else 0 end) as returned_user from (select *, ((date_part('year', order_date) - date_part('year', previous_purchase))*12)+(date_part('month', order_date) - date_part('month', previous_purchase)) as date_diff from (select *, lag(order_date) over (partition by cust_id order by order_date ) as previous_purchase from transactions) as a ) as g group by date_part('month', order_date);

this is another implementation

shahakunal
Автор

Hi Ankit

I did like this
with cte as ( select *, first_value(order_date) over(partition by cust_id) as first_order_date,
last_value(order_date) over(partition by cust_id) as last_order_date
from transactions ),
cte2 as ( select month(order_date) as months, sum(case when (order_date < first_order_date) then 1 else 0 end ) as l1,
sum(case when (order_date > first_order_date) then 1 else 0 end ) as l2
from cte
group by month(order_date) )
select c2.months, (c2.l1+c2.l2) as cx from cte2 c2
join cte2 c3 on c2.months = c3.months

shivammishra-mkjp
Автор

Hi Sir
with temp1 as(
select *, lag(order_date, 1, order_date) over (partition by cust_id order by order_date) as lastdate from transactions
)
select month(order_date), sum(case when then 1 else 0 end) as custtt from temp1 group by month(order_date)

architgarg
Автор

Sir Your videos are top notch. The way you approach the problem is very great. When i try to start few problems i get stuck in 1st step itself, after watching few minutes of the video im able to do it then. How to develop the skill to approach the problem sir. Give me some tips please

chenchuladileep
Автор

What if we have to calculate MOM for next 5-6 months

RishabhAggarwal-uv
Автор

general solu by analytical method:
with cte as (select *, lag(order_date) over(partition by cust_id order by order_date) as prev_order from transactions )
select month(order_date) mnth, count(prev_order) tot_cust from cte group by month(order_date)

biswanathprasad
Автор

with CTE as (
Select *,
DATEDIFF(month, lag(order_date) OVER(Partition By cust_id order by order_date), order_date) as diff
from transactions1)
select MONTH(order_date) as month,
sum(case when DIFF=1 then 1 else 0 end) as cus_count
from CTE
group by MONTH(order_date)

addhyasumitra
Автор

Ankit bhai please let me know if am right here:

with cte as (select cust_id, order_date, lag(order_date) over(partition by cust_id order by order_date) as lag_date
from transactions)

select month(order_date), sum(case when lag_date is not null then 1 else 0 end) as no_retention_count
from cte group by month(order_date)

harishkanta
Автор

with cte as(
select *
, ROW_NUMBER() over (partition by cust_id order by order_date) as rn
from transactions)
select MONTH(order_date) as month_date,
count(case when rn=2 then cust_id else null end) as last_mont_cust
from cte
group by MONTH(order_date)

sagarsaini
Автор

my approach:-
select month(order_date) as mon, count(distinct case when then cust_id end) as new_cust,
count(distinct case when then cust_id end ) as repeat_c
from (
select *, min(order_date) over(partition by cust_id) as min_t from transactions
) a
group by 1

prashantdhamunia
Автор

Hi Ankit, this video was hard to follow as I think no of likes to views ratio shows that.

My query to find % of retained customers:
select
month(t1.order_date),
(count(case when month(t1.order_date) - month(t2.order_date) = 1 then t2.cust_id else null end) * 100 /
count(distinct t1.cust_id)) as one_month_retention
from
transactions t1
join
transactions t2 on t1.cust_id = t2.cust_id
group by 1
order by 1

mohammadabdullahansari
Автор

with cte as (
select *, lag(order_date) over (partition by cust_id order by order_date) as prev_month
from transactions
)
select extract(month from order_date), sum(case when extract(month from order_date)-extract(month from prev_month) = 1 then 1 else 0 end) as count1
from cte
group by extract(month from order_date)

sheebakhan
Автор

i entered multiple orders of a customer in the same month and run the below query...it gives the same rseult

select month(order_date) as months,
sum(case when datediff(month, prev, order_date) >=1 then 1 else 0 end) as cnt from
(select cust_id, order_date, lag(order_date) over (partition by cust_id order by cust_id) as prev from transactions) x
group by month(order_date)

redwannabil
Автор

Hi Ankit Sir,

my question to you is: if the same customer returns back in the month of April after Feb so the output should be

01 0
02 3
04 0

and the same customer has an entry in the month of May then the output would look like:

01 0
02 3
04 0
05 1

Is this correct?


so my query is below :

with cust_check as (
select cust_id, next_order_date, previous_order_date, case when previous_order_date = 0 then 0
when previous_order_date != 0 then period_diff(date_format(next_order_date, "%Y%m"), date_format(previous_order_date, "%Y%m"))
end as month_diff from
(
select cust_id, order_date as next_order_date, lag(order_date, 1, 0)
over(partition by cust_id order by cust_id) as previous_order_date from
(select * from transactions order by cust_id, order_date) as a
) as a
),
final as (
select a.cust_id, year(a.next_order_date) as year_of_date, month(a.next_order_date) as month_of_date,
a.previous_order_date, a.month_diff
from cust_check as a inner join
(select cust_id, count(cust_id) as c from cust_check group by cust_id having count(cust_id) > 1) as b
on a.cust_id = b.cust_id and a.month_diff in (0, 1)
)
select year_of_date, month_of_date, sum(month_diff) as no_of_returning from final
group by year_of_date, month_of_date;

LiveWithDebasmita
Автор

Hi Ankit,

Just a suggestion, Please also make videos for the same questions by solving using pandas.

tusharsharma
Автор

Hi Ankit, Is this correct?

with cte as(
select q.*, lag(month) over(partition by cust_id order by month) as lagged from
(
select distinct cust_id,
strftime('%m', order_date) AS "Month"
from transactions t
)q
), retained as(
select month, count(*) from cte where month-lagged=1
group by month
), no as(

select distinct strftime('%m', order_date) AS "Month", 0 from
transactions t
where strftime('%m', order_date) not in (select distinct month from retained)
)
select * from retained
union
select * from no

sreejitchakraborty