Solving a Complex SQL Interview Problem | Find the Most Modified File Extension

preview_player
Показать описание
In this video let's solve a intermediate/complex sql interview query.
The problem was shared during a recent sql interview. We are given a table with file extensions and we need to find the most modified file extension using SQL.

Download the dataset from my blog below:

Timestamp:
00:00 Intro
00:14 Understanding the problem statement
07:29 Breaking down the SQL problem
10:15 Writing the SQL Query to solve the problem

Рекомендации по теме
Комментарии
Автор

Hi Taufiq,
The explanation is the best and clear.
I have done using multiple ctes and analytical function count
with cte as(
Select id, date_modified, file_name, substring(file_name, CHARINDEX('.', file_name)+1, 4) as file_ext from files
)
, cte2 as(
Select distinct date_modified, file_ext, count(file_ext) over (partition by date_modified, file_ext order by date_modified) as cnt
from cte)

select date_modified, STRING_AGG(file_ext, ', ') within group (order by file_ext desc) as extension, cnt from
cte2 c1
where cnt = (select MAX(cnt) from cte2 c2 where
group by date_modified, cnt

TheVaibhavdang
Автор

Break it and crack it !! This is the success Mantra I have learned here. Thanks a lot, Thoufiq

bruzo
Автор

Even the complex concepts looks so easy....your explanation is just awesome Taufeeq 🔥!!

nehalahmad
Автор

SQL looks very simple with your teaching. Thanks a lot and keep inspiring Sir!

nagaprasadreddy
Автор

Thanks alot Taufiq. Below is my solution. I will look into your solution now and compare. Thanks again for sharing this.

with cte as (Select ID, Date_Modified, File_Name, SUBSTRING(file_name, INSTR(file_name, '.')+1) as extension,
count(*) over (partition by Date_Modified, SUBSTRING(file_name, INSTR(file_name, '.')+1) order by Date_Modified) as cn
from files)

select Date_Modified, GROUP_CONCAT(distinct extension order by extension desc) as extension, cn from cte where (Date_Modified, cn) in
(
Select Date_Modified, max(cn) from cte group by Date_Modified)
group by Date_Modified
order by Date_Modified asc

muhammadfazlani
Автор

That was awesome
Didn’t blink my eye though out the entire video

kashmirshadows
Автор

I used the analytical functions count and rank, then your string concatenate function. Fun stuff!

with
a as
(
select date_modified, right(file_name, len(file_name) - CHARINDEX('.', file_name) ) as file_ext,
count ( right(file_name, len(file_name) - CHARINDEX('.', file_name) ) )
over (partition by right(file_name, len(file_name) - CHARINDEX('.', file_name) ), date_modified ) as cnt
from files
),

b as
(
select distinct
date_modified, file_ext, cnt
, rank () over (partition by date_modified order by cnt desc) as row_num /*highest instance count row 1*/
from a
)


select date_modified,
STRING_AGG(file_ext, ', ') within group ( order by file_ext desc) as file_ext,
cnt
from b
where row_num = 1
group by date_modified, cnt
order by 1

arturoramirez
Автор

the perfect solution for this kind of puzzles, I appreciate it

javokhirilkhamboyev
Автор

for "string_agg(file_ext, ', ' order by file_ext desc) as extension", I did "group_concat(file_ext order by date_modified separator ", ")" for MySql.

sumeim
Автор

Hi Taufiq, your explanation is just awesome, Please make video on query optimization

abhishekgowda
Автор

To find the most used in that day I would have used QUALIFY cnt = max(cnt) over (partition by date). This saves quite a few steps

blockwhisperers
Автор

Amazing video thank you so much. Because of you I can easily solve medium level questions on stratascratch, data lemur etc...and hard questions too

shivsharma
Автор

Hi Taufiq,
Explanation is the best and clear.
I have a request, please do videos on external tables, Table partition and collections.

suryakanth
Автор

Amazing video your a great teacher. I was confused about why you put “cte c2”

Cheap_Mycology
Автор

Hi Taufeeq, Thanks for posting here is my solution:
with extensions as (
SELECT
id,
date_modified,
file_name,
right(file_name, (length(file_name)-(locate('.', file_name)))) as extension_types
FROM files
), rnkings as (
SELECT
id,
date_modified,
extension_types,
row_number()over(partition by date_modified, extension_types order by extension_types) as rnks
from extensions order by date_modified
), maximum_rankings as (
SELECT
date_modified,
max(rnks) maxi_rnks
from rnkings
GROUP by date_modified
), to_be_final_result as (
SELECT
mr.date_modified,
mr.maxi_rnks,
r.extension_types
from maximum_rankings mr
join rnkings r
on mr.date_modified = r.date_modified
and mr.maxi_rnks = r.rnks
)

SELECT
date_modified,
max(maxi_rnks) as count,
group_concat(extension_types) as extension
from to_be_final_result
GROUP by date_modified

tahakhalid
Автор

It Is a Very Amazing Video Brother... Thank you brother

vinayakpam
Автор

Thank you so much for sharing SQL knowledge
Great 💯

Madhusudan_Sarkate
Автор

with cte as
(select date_modified
, right(file_name, length(file_name)-position('.' in file_name)) type
, count(1) cnt
from files
group by 1, 2),
cte2 as
(select *,
dense_rank() over(partition by date_modified order by cnt desc
range between unbounded preceding and unbounded following ) rnk
from cte)
select date_modified, string_agg(type, ',' order by type desc) type, cnt
from cte2
where rnk = 1
group by 1, 3

rohit_vora
Автор

your contents are very useful, Thanks for it, please do on performance &

devkilari
Автор

Sir, can we use the rank() function instead of nested sub query in second step? It may require additional CTE to complete the query. Which query will perform better, CTE or nested sub query? Thanks.

hellomilind