Hard SQL Interview Question From FACEBOOK | Data Science Coding Interviews (Popularity Percentage)

preview_player
Показать описание
This SQL data science interview question was asked by Facebook. I’ll cover both the question and answer and give a detailed explanation of the approach. I walkthrough each step of my answer, assumptions, approach, and explain every line of code I write. This is literally how I would answer every data science interview question and prepare for every data science interview at FAANG companies and others.

This question is marked as hard from Facebook. The question involves manipulating your datasets so that you first are calculating total number of users in the table using a UNION of the two columns. You’ll then calculate the number of friends a user has by also using another UNION. These two queries become subqueries and you’ll be using a SQL JOIN ON 1=1. These concepts are what makes the question hard. Once you have the two SQL subqueries, you can implement the percentage formula.. This question covers concepts that are commonly found in data science interviews at Facebook and Google.

______________________________________________________________________

______________________________________________________________________
Timestamps:

Intro: (0:00)
Interview Question: (0:11)
Exploring The Datasets: (0:44)
Developing The Framework For The Solution: (1:09)
Coding The Solution (Total Number Users On Platform): (4:21)
Coding The Solution (Total Friends): (7:03)
The Trick!: (8:00)
Coding The Solution (Percentage): (9:50)
Trick 2! JOINING 1=1: (10:40)
______________________________________________________________________
About The Platform:

I'm using StrataScratch, a platform that allows you to practice real data science interview questions. There are over 1000+ interview questions that cover coding (SQL and python), statistics, probability, product sense, and business cases.

I created this platform because I wanted to build a resource to specifically help prepare data scientists for their technical interviews and to generally improve their analytical skills. Over my career as a data scientist, I never was able to find a dedicated platform for data science interview prep. LeetCode and HackerRank were the closest but these platforms specifically serve the computer developer community so their questions focus more on algorithms that working with data.

______________________________________________________________________
Contact:

If you have any questions, comments, or feedback, please leave them here!
______________________________________________________________________
Рекомендации по теме
Комментарии
Автор

That's insightful. Using a CTE helped me shorten the query. Also when using UNION, you get rid of duplicates with the individual tables as well. So using distinct is not required.

with total as (
select user1, user2 from facebook_friends
union
select user2, user1 from facebook_friends)

select distinct user1, cast(count(user2) over (partition by user1) as float)*100/(select count(distinct user1) from total) from total
order by user1;

tanvimehta
Автор

Thank you so much Nate. Find it very easy to understand what’s the solution you’re taking. As a beginner in SQL this really helps me understand more clear.

rashvinganesh
Автор

Great video and content! I was able to use count(user2) OVER (Partition by user1) after doing the union to get to the friends of the users.
Was able to confirm by submitting on the platform too :)

avanichheda
Автор

Great solution, without your explanation, it is hard to figure it out. Thank you.

caiyu
Автор

This is one of the best union combined with cte problem I have come across
My solution:-
with t1 as (select * from facebook_friends
union all
select user2, user1 from facebook_friends),
t2 as (select user1, count(user2) no_friends from t1
group by user1
order by user1),
t3 as (select user1, no_friends, count(*) over() total from t2)
select user1, (no_friends/total)*100.0 pop_per from t3

dwaipayansaha
Автор

Tricky question but in reality all the users even in the user2 column should be present in the user1 column too. This is the basis of a user table. This question is all about what if all the users are not present in user1 column, then your code is the ultimate solution.
Hatsoff to your thinking which considers the worst case i didn't expect.

meghasyam
Автор

I found this platform is very helpful because it allows me to run code and let me see result step by step. In this way, I can better understand what I'm doing and where I got wrong.

StanleySI
Автор

Awesome dear you did a great explanation kindly make more videos like this 😀

gagansingh
Автор

Thank you for the insights. I also tried doing it in similar manner, with a shorter query

select user1, Round((Friends/max(users_id) over())*100, 3) as PP
from
(select *,
count(user1) as Friends,
row_number() over(order by user1) as users_id from
(select *
from facebook_friends
UNION ALL
select user2, user1 from facebook_friends) as A
group by user1
order by user1) as B;

finvestomate
Автор

Thank you, Nate.
Really good approach.

joaopedroreissilva
Автор

Hey Nate! Thanks for the video - they are really insightful, and helps a lot when I do the questions! Quick question - is there a reason to join the tuu value? What I did was just to put it as the denominator in the select statement, which I thought should save some time since you don't have to join. Thanks!

thepogchamp
Автор

Great walkthrough! But still think the number of friend of each user should not be computed using union because A friends with B does not mean B friends with A. Those who only show up in the second column should be considered having no friends.

danielxing
Автор

if you use "union" you don't need distinct before user1 and user2. union will do that for you

ismafoot
Автор

This is a "hard" question? This is SUPER easy!

Hotobu
Автор

Hi Nate, thank you !
This solution below seems to work also but I used WITH instead of subqueries, is it an issue ? Thank you for your feedback !

WITH total_users as (
select distinct user as all_users from facebook_friends
union
select distinct friend from facebook_friends

),

consolidated_table as(
select all_users, count(user) as number_user, (select count(*) from total_users) as total_u
from total_users
cross join facebook_friends
where all_users = user
OR all_users = friend
group by 1
)

select all_users, (cast(number_user as float)/cast(total_u as float))*100 as percent from consolidated_table;

ghislaineamrani
Автор

Hi Nate,

To calculate number of friends each user has, can we do this ?

SELECT user1, COUNT(user2)
FROM facebookfriends
GROUP BY user1

so this would group all the friends of each user in user1 since right column are the friends right ?

radhikashroff
Автор

Thank you so much for videos and stratscratch, i had general query, as in the above video unless we see tha data and if we only go with the schema mentioned we tend to believe that user 1 column has all users already, so in a real facebook or top company interviews do we actually have the data available for us or do we actually have to ask the interviewer on the limitations on data set .

rithikgaur
Автор

Hey Nate ! Can we use self join and not union for this problem ?

chandraprakash
Автор

Thanks Nate for keeping us educated ! Just a logical Question I had, Since you are saying user1 is the user and user2 is their friend list, I hope even the friend should be a user on Platform, So can we cant consider select distinct user1 from facebook_friends as total users, Correct me if I am missing something .

sandeepvenkatasairam
Автор

I am not sure if we need to use DISTINCT if we are using union already

SudhirKumar-rlwt