Solving a LinkedIn Data Science SQL Interview Question

preview_player
Показать описание
Hi everyone, today I go over how to tackle a medium difficulty interview question asked on a data science interview at LinkedIn!

More from Jay:

Relevant Links
Рекомендации по теме
Комментарии
Автор

instead of average, could you use MAX?

ryezizzle
Автор

I am trying to imitate how you explain your thinking, I think it is very important in a live interview! Great video jay.

icvsmiglani
Автор

I think the query where your rolling up for getting final output is incorrect. You should be using count of userid in outer query not sum as aggregate.

Instead of using avg. Using max could be a better option.

GadgetsOfHitanshYadav
Автор

My solution -

select
sum(case when n_job_id = n_job_posted then 1 else 0 end) as post_one_time
, sum(case when n_job_id < n_job_posted then 1 else 0 end) as post_multiple_times

from
(select
user_id
, count(distinct job_id) as n_job_id
, count(*) as n_job_posted

from job_postings
group by 1) a

Leon_
Автор

Is the whole problem#2 part missing? What about the "within 180 days of posting the same job again" constraint? I think you'll need a self join to do that.

nanfengbb
Автор

Window Function of Rank on number of times posted partitioned by user_id, jobId and order by desc of number of times

Filter for 1 thereby getting the max

case when this max > 1 then more than once else less than once


we can just use a max as well at the same hierarchy level

adibhatlavivekteja
Автор

It would be great to hear your thoughts on this query. It might be bit short but the credit for crux of the code goes to Jay.

With cte (
Select user_id, job_id, count(distinct date_posted) as job_posted from job_posting
Group by 1, 2)

Select sum(case when max(job_posted) over () = 1 then 1 end) as user_once,
sum(case when max(job_posted) over() != 1 then 1 end) as user_multiple from cte
Group by user_id;

vandanalunia
Автор

I think this should work:

select
count(distinct case when num_post >1 then 1 end) as num_post_mult
, count(distinct user_id) - num_post_mult as num_post_1
from(
select
user_id
, job_id
, count(distinct date_posted) as num_post
from job_postings)

oliveiralgm
Автор

Nice. Thanks for these videos. I wouldn’t have thought of using an average. Are you going to do a video on the 2nd question?

neversaynever
Автор

I think this should work for the 1st question -

with cte(user_id, freq) as
(select user_id, case when sum(ct) = count(job_id) then 'Y' else 'N' end as freq
from
(select user_id, job_id, count(id) as ct
from job_postings
group by user_id, job_id)
group by user_id)
select count(case when freq = 'Y' then 1 else null end) as only_once,
count(case when freq = 'N' then 1 else null end) as more_than once
from cte;

shruthinalpetadimurthy
Автор

Hey Jay, here's my solution, please do let me know what you think🙈 I'm a bit unsure about the group by in the main query

with t as
(select user_id, job_id, count(date_posted) num_posted
from job_postings
group by 1, 2)

select user_once,  
user_multiple
from t
group by user_id


If the above code doesn't work, the following should do, hopefully

with t as
(select user_id, job_id, count(date_posted) num_posted
from job_postings
group by 1, 2),
tt as
(select count(job_id) num_jobs, sum(num_posted) sum_num_posted
from t
group by user_id)

select sum(num_jobs = sum_num_posted) user_once,
sum(num_jobs != sum_num_posted) user_multiple
from tt

jacquelineroronoad
Автор

with kpi as (
select user_id, count(distinct job_id) total_job, count(distinct date_posted) total_posts
from job_posts
group by 1)

select
sum(case when total_posts > total_job then 1 else 0 end) users_multiple,
sum(case when total_posts = total_job then 1 else 0 end) users_once
from kpi

shirbarel
Автор

if each time user posts a job that creates a row, and assume that's keyed with unique ID with ID column, why don't you just count number of ids (which is essentially user-job_id-date_posted pair combination), and aggregate at user level. So it can be as simple as:

with job_posting_user_agg as (
SELECT
user_id,
count(id) as num_postings
FROM
job_posting
GROUP BY 1
)
SELECT
count_if(num_jobs = 1) as single_posters,
count_if(num_jobs > 1) as multiple_posters
FROM
job_posting_user_agg

garyboy
Автор

This seems like a good use case for a self join

TheElementFive
Автор

The answer to problem# 2:

with posting_cnt as
(select user_id, job_id, count(date_posted) as num_posting
from job_posting j1 inner join job_posting j2
on j1.user_id=j2.user_id and j1.job_id=j2.job_id
and j1.date_posted<=dateadd(dd, 180, j2.job_posted)
and
group by user_id, job _id
having count(date_posted)>1)

select (count(distinct pc.user_id)/(select count( distinct user_id) from job_posting))*100 as '% of users posted again in 180 days',
(1-(count(distinct pc.user_id)/(select count( distinct user_id) from job_posting))*100 as '% of users posted once',
from posting_cnt pc

jaysahu
Автор

Thank you for your help Jay. How would this query work?

WITH user_job AS
(
SELECT user_id, job_id, COUNT(DISTINCT date_posted) AS #_posted
FROM job_postings
GROUP BY user_id, job_id
)
SELECT SUM(IF(MAX(#_posted) = 1, 1, 0 ) AS #_once, SUM(IF(MAX(#_posted) > 1, 1, 0) AS more_than_once
FROM user_job
GROUP BY user_id

kartiikss
Автор

Hi Jay, thanks so much for your video, this is so helpful! Is it ok to go through the same thought process (draw the output, and then dummy table) during the interview, or are you supposed to write the code straight away?

mariakamynina
Автор

Hey, keep up the good work! The trick of avg is super smart!

sophial
Автор

with jobs as (
select user_id, job_id, count(*) as num_posts
from job_postings
group by 1, 2
)
select
count(distinct case when num_posts > 1 then user_id end) as posted_at_least_one_job_multiple_times,
(select count(distinct user_id) from job_postings) - count(distinct case when num_posts > 1 then user_id end) as posted_jobs_once
from jobs

brothermalcolm
Автор

I guess this should work:
I did not consider date coz that was not in specification, if a job id is posted multiple times, it can be on same day or any other day.
Would appreciate your response if I missed anything :

Select
sum(case when totals= 1 then 1 else 0) as posted_once,
sum(case when totals >1 then 1 else 0) as Posted_multiples from
select userid, jobid, count(*) as totals from jobposting group by userid, jobid

charusamaddar
visit shbcf.ru