SQL Query Interview Questions - How to delete duplicates from a table? #sqlinterviewquestions

preview_player
Показать описание
This video series discusses some most commonly asked scenario based SQL Query Interview questions.

In this video, we discuss the SQL query to delete/remove duplicates from a database table.

The sample dataset and SQL statements are available here -

Must Do Data Analytics Certifications -

Google Data Analytics Professional Certificate

Google Advanced Data Analytics Professional Certificate

Best SQL and Data Analytics Books -

Please do not forget to like, subscribe and share.

For enrolling and enquiries, please contact us at
Рекомендации по теме
Комментарии
Автор

The practice dataset and SQL statements for this video tutorial are available here -

LearnatKnowstar
Автор

One important thing to ask is "what causes duplicate records in the database?" In large applications where data comes from multiple sources, "de-duping" before insert is a real problem. It's good to know how to deleting duplicates, but in some cases that's not a viable option. For example, if the table has 100 million rows, deleting duplicates could be quite expensive and shouldn't be run during business hours.

When I ask SQL questions in interviews, I'm not looking for "how do I use group by for de-duping." I want to see the candidate thinking about the larger problem and taking time to understand business needs. Cleaning up dupes after the fact doesn't scale and candidates that ask "what is causing dupes and what's the impact" are the developers I want. A developer that only knows how to group by, but never bothers to ask "why is this happening and what is the root problem" aren't people I will hire.

woolfel
Автор

Best is to use row_number() and partition by to create sequence column on duplicates - applicable to all kind of duplicates ( identical rows especially)

whitecrowuk
Автор

Thanks for making this interview questions series.. cleared my major doubts

surajpatil
Автор

The question is how to delete duplicates from table, not how to display duplicates or how to display unique - two different things. The correct answer is DELETE FROM WHERE max() … , even though this is inefficient.

It’s about how to ingest data into table without duplicates.

danieljust
Автор

Video is visible but explanation is superb

praldandagi
Автор

Have you been asked a SQL query interview question that you couldn't answer?

Let us know in the comments below and we will answer those in our upcoming videos!

LearnatKnowstar
Автор

Output sequence of below query should be -
select firstname, lastname, count(*) from employee gropy by fistnmae, lastname-

output-
firstname lastname count(*)
Adam ownes 2
Mark wills 1
natasha lee 2
ruley jones 1

shitalgavasane
Автор

please upload dataset as well to follow along.

niteshsingh
Автор

How deleting the records from CTE is deleting the rows from main table?

TechnoSparkBigData
Автор

I appreciate your work mam and videos are good explanatory

javeedakramshaik
Автор

So deleting data with in cte will delete data in table too, how? Does this happens in derived table and views also?

pankajbhatt
Автор

Hi, I have a query please reply to what is wrong in it,
my employ table contains,
id, name, sal, email

with dup_emp as (select *, dense_rank() over (partition by email order by id desc) as dens_rnk
from employ e)

delete from dup_emp where dens_rnk >1

now this code is showing this error,

SQL Error [42P01]: ERROR: relation "dup_emp" does not exist
Position: 127


i am selecting everything, including the cte, and then executing the query

ishitvasingh
Автор

From rank function it's better .. thanks

skdonsingh
Автор

Deleting from CTE deletes data from the source How?

kayk
Автор

Really appreciate your effort..
If possible please add table script as well, it will helpful for beginner's.
Thank you!

VinodSharma-zs
Автор

Hello there,
Can you please do a video on how to add a new large dataset (million of rows) to an exsiting table without deleting the data in the table. Provided that column names and data types are the same in old table and new added data. Thanks

ouramazingnature
Автор

For finding the duplicate values we can use 'having count >1', it wont be possible to use a delete funtion here and the having clause as in a subquery?

akhildas
Автор

as per my understanding if we have >2 duplicate records in a table then rank() and denserank() will not not work here in this case we have to use row_number() only!!!

ersuresh
Автор

When I try to duplicate your example on mysql, i get the error Error Code: 1288. The target table employye_cte of the DELETE is not updatable . This is the query I am trying to run

with employye_cte as
(select firstname, lastname, employeenumber, row_number() over (partition by lastname order by employeenumber) as rownumber
from employees1)
delete from employye_cte where rownumber = '2'
What am I missing?
Thanks

geraldsegun
welcome to shbcf.ru