Solving an Airbnb Data Science Coding Interview Question | SQL Interview [Fav Host Nationality]

preview_player
Показать описание
This SQL data science interview question was asked by Airbnb. I’ll cover both the question and answer and give a detailed explanation of the approach. I walkthrough each step of my answer, assumptions, approach, and explain every line of code I write. This is literally how I would answer every data science interview question and prepare for every data science interview at FAANG companies and others.

The question has many concepts that are always tested during a data science interview. This SQL interview question requires 3 JOINs and 2 subqueries. There are some tricks tested where you have to identify the highest review score using the SQL max() function. And lastly, this coding interview question tests your knowledge on how to de-duplicate data.. This question covers concepts that are commonly found in data science interviews at Facebook and Google.

______________________________________________________________________

______________________________________________________________________
Timestamps:

Intro: (0:00)
Interview Question: (0:10)
Explore Dataset: (0:35)
Writing Out Approach: (01:57)
Coding The Solution: (4:40)
2nd Solution!: (10:41)
______________________________________________________________________
About The Platform:

I'm using StrataScratch, a platform that allows you to practice real data science interview questions. There are over 1000+ interview questions that cover coding (SQL and python), statistics, probability, product sense, and business cases.

I created this platform because I wanted to build a resource to specifically help prepare data scientists for their technical interviews and to generally improve their analytical skills. Over my career as a data scientist, I never was able to find a dedicated platform for data science interview prep. LeetCode and HackerRank were the closest but these platforms specifically serve the computer developer community so their questions focus more on algorithms that working with data.

______________________________________________________________________
Contact:

If you have any questions, comments, or feedback, please leave them here!
______________________________________________________________________
Рекомендации по теме
Комментарии
Автор

This is again another piece of mind blowing example..keep up the good work.

umakanta
Автор

Thank you Nate, you have a great platform

1)Using correlated subquery:

SELECT DISTINCT a.from_user, h.nationality
FROM airbnb_reviews a
JOIN airbnb_hosts h ON a.to_user = h.host_id

WHERE a.from_type = 'guest'
AND a.review_score = (SELECT MAX(b.review_score)
FROM airbnb_reviews b
WHERE a.from_user = b.from_user)

2) Using window function:

SELECT DISTINCT a.from_user, b.nationality
FROM
(SELECT from_user, to_user, review_score,
DENSE_RANK() OVER(PARTITION BY from_user ORDER BY review_score DESC) rk
FROM airbnb_reviews
WHERE from_type = 'guest') a
JOIN airbnb_hosts b ON a.to_user = b.host_id
WHERE rk = 1

miradizrakhmatov
Автор

Your question selection is quite good..!! Keep going..!! I have a suggestion though...To keep the whole thing interesting you can maybe give a similar question as take-home question and answer it in the next video..! What say..? Good work..!!👏👏

saraali-wsuv
Автор

Why are you not doing it by window function max(rating) over (partition by guess_id)
And then put a condition where max_rating = rating

mohammadshahfaishal
Автор

Here's my solution using CTE

with cte1
as
(
select *,
dense_rank() over(partition by from_user order by review_score desc) host_rank
from airbnb_reviews
where from_type = 'guest' and
to_type = 'host'
)
select distinct from_user, h.nationality fav_host_nationality
from cte1 c,
airbnb_hosts h
where c.host_rank = 1 and c.to_user = h.host_id
order by from_user

shobhamourya
Автор

Great question! Any alternative methods to solve this?

vigneshiyer
Автор

Hi,
I do not have the premium feature to check whether my output is correct. Can anyone suggest my code is corrector not? Here's my code :
with summary_data as
(
select
ar.from_user,
ah.host_id,
ar.review_score,
ah.nationality,
row_number() over (partition by ar.from_user order by ar.review_score desc) as rnk
from
airbnb_reviews ar inner join airbnb_hosts ah
on ar.to_user = ah.host_id
where ar.from_type ='guest'
)
select
distinct from_user, nationality
from summary_data where rnk=1 ;

swayankashanu
Автор

with a as (select from_user, review_score, h.nat as natt, to_user,
dense_rank() over (partition by from_user order by review_score desc) as rank_ed
from airbnb_reviews
inner join
(select distinct(host_id) as id, nationality as nat from airbnb_hosts) as h
on h.id = to_user
where from_type = 'guest' and to_type = 'host')
select distinct(natt), from_user, rank_ed, review_score from a
where rank_ed = 1
order by 2

hassamkafeel
Автор

I observed one anomaly in the data where host and guest user id is same in airbnb_reviews table, which means host himself acted as a guest and gave a max or higher rating. We should remove this in filter where from_user and to_user is not equal. Please comment

emadhussain
join shbcf.ru