Facebook Data Scientist Mock Interview - Measure User Churn + SQL Problem

preview_player
Показать описание

====== ✅ Details ======

🤔 "How would you measure churn on Instagram?"

Here's a mock interview based on Facebook's technical phone screen with a product-sense and SQL question. The interviewer was a data scientist at Google and PayPal, and the candidate is preparing for product data science roles at FAANG companies.

👍 Make sure to subscribe, like and share!

====== ⏱️ Timestamps ======

0:00 Intro
01:20 How would you measure user churn on Instagram?
08:05 Choose one primary metric
02:07 Should user sign-in be considered as churn?
20:40 Product-sense solution
30:39 SQL solution

====== 📚 Other Useful Contents ======

1. Principles and Frameworks of Product Metrics | YouTube Case Study

1. How to Crack the Data Scientist Case Interview

2. How to Crack the Amazon Data Scientist Interview

====== Connect ======

Рекомендации по теме
Комментарии
Автор

this is how I was approaching the SQL statement:

with cte as (select distinct user_id, timestamp, activity, current_date - DATE(max(timestamp)) as days_since_activity
from usage

where activity != 'sign-in')

select count(*) from cte where days_since_activity > 7

saad
Автор

can you do a min(date) and max(date) partition by ... find the difference between the 2 dates where > 7 and that will be all the "churned users" ?

lebronGilgeous-Kim
Автор

For the SQL Query part

Can't we just find the MAX(timestamp) per user and see whether CURRENT_DATE - MAX(timestamp) >7 for a user . If it is, then the user is churned ?

brittosabu
Автор

Hi! I am sorry to say but the second measure does not really make sense to me. # of unique users is billions. Anything will be very very very small. How can you distinguish the drop between super duper small numbers?

klintmane
Автор

I tried a similar approach for the SQl but introducing CASE statements:

With master as
(
Select
count(distinct user_id) as total_users,
count(distinct case when ( activity in (‘read’, ‘comment’, ‘like’, ‘post’) and
(timestamp between dateadd(day, -7, current date) and current date))
then user_id end) as active_user
From usage
)
Select (total_user - active_user ) as churned
From master

anmol
Автор

For the sql question, can we first get a temporary table and then only count user id that is not in the table where they have any engagement within the last 7 days, in that case, we would get a comprehensive list of users.

allison-hdfg
Автор

Just a question about SQL part. If a person is active in NOW() - 8, and then remains inactive. In the logic shown in the video, this user will be in the total active user list while not in recent 7 days ([NOW()-6, NOW()] ) active user list. Hence it would be counted as a churn. But, my question is that, the problem is to find user churn of TODAY. Shouldn't that user be counted as the user churn of yesterday instead of today? Since based on the observation of yesterday, that user is already a churn. I am confused that to calculate the user churn of TODAY, should we use the user list active in [NOW()-7, NOW()-1] minus user list active in [NOW()-6, NOW()]?

waynezhang
Автор

Great video, very insightful! Dan, would really appreciate it if you could reply to my comment: for the product sense question, you mentioned that churn is defined as being totally inactive within 7 days rather than having no engagement for 7 days. However, in the sql problem, churn is defined exactly as what kevin outlined as the second definition in the product sense problem. I find this contradictory and confusing and would really appreciate your help!

zzpdanny
Автор

cant we use the lag window function on date partitioned by user_id and then compare that result vs today's date to figure out user churn?

gtang
Автор

*I wish I could become a Product Analyst at Google.*

aravindkramesh
Автор

My SQL solution for this question:


with active_users as (select date(timestamp),
distinct user_id
from usage
where activity in ('read', 'comment', 'like', 'post')
and date(timestamp) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE())

select count(distinct user_id)
from usage
where user_id NOT IN (select user_id from active_users)
and date(timestamp) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE())

yuhan
Автор

Hi, why I can't I just count users where filter not in (read, write, post, like) ?

namandoshi
Автор

You forgot to subtract new users who just started using instagram in the past week, so the user churn from your query is under-counted

suhascrazy
Автор

SELECT COUNT(DISTINCT user_id) AS churned_users
FROM usage
WHERE
activity NOT IN ('sign-in') AND
timestamp <= NOW() - INTERVAL 7 DAY;

JobQuestUnveiled
Автор

do you have a video with the "optimal" answers for all questions

coolghoul
Автор

Very professional and informative. Can you do a python solution for this?

apsaravidhya
Автор

Good video. Thanks for taking the time to make this

ev
Автор

If I am the interviewer, I probably will give a 'not hiring' to this candidate... The sql question is not very hard

lingxu
Автор

Thank you so much for this informative and helpful video. Can you check my solution for the SQL question?

select
count(t1.user_id) as churn
from
(select
distinct user_id
from `Usage`
where DATE(timestamp) <= DATE(NOW()) - 1
AND DATE(timestamp) >= DATE(NOW()) - 7
) as t1

LEFT ANTI JOIN

(select
distinct user_id
from `Usage`
where DATE(timestamp) <= DATE(NOW())
AND DATE(timestamp) >= DATE(NOW()) - 6
) as t2

ON t1.user_id = t2.user_id

dangtran
Автор

datainterview.com v/s stratascratch ?? anoyone?

adityagaikwad
welcome to shbcf.ru