How to Delete Duplicates in Production Environment | SQL Delete Duplicate Rows

preview_player
Показать описание
In this video we will discuss multiple scenarios of duplicate data. We will see how we can delete duplicates with backup.

Playlist for complex SQL questions:
Рекомендации по теме
Комментарии
Автор

Hi Ankit,
Your videos are really helpful. I am regularly watching your videos and learning SQL.
A small request from my side. Could you please mention CREATE TABLE and INSERT VALUES statements in description so that its easy for the viewers to practice.

sowmiyav
Автор

My approach to delete in one based on id and time
delete t1 from Transactions as t1, Transactions as t2 where t1.order_id=t2.order_id and

hairavyadav
Автор

Thanks for the great content!. Please add the create/insert statements so that it is easy to do hands on

sonunitjsr
Автор

For the pair of columns to work, we can concat both and use the result.

PriyaYadav-jhwj
Автор

Hi Ankit,

delete from transactions
union
delete from transactions

Why can't I use above query which will also remove duplicates as Union will remove duplicates and arrange the result set

Please suggest.

akshaykodekal
Автор

CREATE TABLE Transactions (
order_id INT,
order_date DATE,
product_name VARCHAR(10),
order_amount INT,
create_time DATETIME
);


INSERT INTO Transactions (order_id, order_date, product_name, order_amount, create_time)
VALUES
(1, '2022-03-03', 'P1', 150, '2022-03-03 15:34:51.067'),
(1, '2022-03-03', 'P1', 150, '2022-03-03 15:35:12.400'),
(1, '2022-03-03', 'P2', 200, '2022-03-03 15:35:49.610'),
(2, '2022-03-03', 'P2', 200, '2022-03-03 15:36:45.933'),
(2, '2022-03-03', 'P2', 200, '2022-03-03 15:38:09.810'),
(3, '2022-03-03', 'P3', 300, '2022-03-03 15:38:28.523');

sumitahirwar
Автор

I have a scenario problem statement to share, How to share and need your support on that problem?

sagarthapliyal
Автор

We use the row_number and also dense_rank approach in our spark code 😊

aneksingh
Автор

I'm new to SQL, so is there any way we can do it without backing up or the tables which don't have timestamps? It is really confusing as a beginner

ayankumarbajpai
Автор

In cases of pure duplicate we can use just group by over all the columns in the table .

subhojitchatterjee
Автор

Thanks... Great explanation... Subscribed to your channel..

BengaluruGuy
Автор

can we have the dataset to apply with you

sherifelgazar
Автор

Sir, What if we have same and different created time?

sagarthapliyal
Автор

Msg 4104, Level 16, State 1, Line 77
The multi-part identifier "T.ORDER_ID" could not be bound.
Msg 4104, Level 16, State 1, Line 77
The multi-part identifier "T.CREATE_TIME" could not be bound.


DELETE T
FROM TRANSACTIONS
INNER JOIN
(
SELECT
ORDER_ID
, MIN(CREATE_TIME) AS [CREATE_TIME]
FROM
TRANSACTIONS
GROUP BY
ORDER_ID
HAVING
COUNT(1) >1
) A
ON T.ORDER_ID=A.ORDER_ID AND T.CREATE_TIME=A.CREATE_TIME


For the Query i am getting the above error

rajkumarrajan
Автор

Hey Ankit, Can you make videos on Indexing, normalization, views and their use cases. You can also create your telegram channel so that more people will join and can discuss sql queries.

abhikworld
Автор

Hello Sir ! Thank you once again for other useful video!
Many Thanks!
I have a big request(maybe someone reed my comment and help me) please-i don t have a LeetCode subscription and i have left to solve only 2 question regarding SQL-2153-The number of passengers in each bus || and the last question 2175-The Change in Global Rankings
Please can you help me only with the data from this 2 question and the requests-i don t need the solution.
I can not find on internet the data from this 2 question.
I will let here my solution-maybe is useful for someone at the question2142-The number of passengers in each bus | -
with bus as (

select bus_id, arrival_time,

row_number()over(order by arrival_time) as tip

from buses

)

select b.bus_id, count(p.pasager_id) as how_many

from bus b left join passengers p on p.arrival_id between b.tip and b.arrival_time

group by b.bus_id

order by 1
All the best!

florincopaci
Автор

@ankit bansal
sir what if my data is something like this and i cannot use cte because i am not using sql servere
id name salary dept_id
-- ----
E1 XYZ 80000 1
E1 XYZ 80000 1
E1 XYZ 80000 1
E2 PQR 82000 5

smitnema