TikTok ByteDance SQL Interview Question for Data Scientists and Data Analysts (StrataScratch 2145)

preview_player
Показать описание
Solution and walkthrough of a real SQL interview question for Data Scientist and Data Analyst technical coding interviews. This question was asked by TikTok/ByteDance and is called "Date of Highest User Activity".

0:00 Intro
0:44 Problem Walkthrough
11:12 Problem Discussion
13:04 Outro

Playlists:

Рекомендации по теме
Комментарии
Автор

This is a good solution Frederik, except I think it's best to use DENSE_RANK() rather than RANK() function. Why? Because the n_users of 5 has 1 row and n_users of 4 has 3 rows, each of which is "tied" with rank 2. But the next n_users in descending order has 3 n_users that will each have rank of 5 because the virtual rank of #3 and #4 both are assigned rank #2. So rank #3 and #4 are skipped.

This is not an issue in this particular question because the question only wants the top 2 ranked row sets (r <= 2) But if it asked you to return the top 3 ranked row sets (r <= 3), using the RANK() function would miss the rows ranked 3, since those ranks were skipped; the next rank is actually 5.

So the point is using RANK() here is dangerous because you would get the wrong results. The correct window function is to use the DENSE_RANK(), which does NOT skip rank numbers.

RedShipsofSpainAgain
Автор

thanks for sharing. in my experience, you cannot combine aggregate function (eg count) with window function (eg rank). what version of sql are you using?

yannelfersi
Автор

Why not simply order by active users and select the top 2 rows? Rank is an overkill for this question no?

aadilzikre
Автор

SELECT trim(to_char(date_visited, 'Day')), COUNT(DISTINCT user_id) AS n_users, date_visited
FROM user_streaks
WHERE date_visited BETWEEN '2022-08-01' AND '2022-08-07'
GROUP BY date_visited
ORDER BY date_visited DESC
LIMIT 2 is this solution correct @Frederik Müller Please let me know. Thanks

saifullahhaidar