Find and remove duplicate rows by two columns

Question

I read all the relevant duplicated questions/answers and I found this to be the most relevant answer:

INSERT IGNORE INTO temp(MAILING_ID,REPORT_ID) 
SELECT DISTINCT MAILING_ID,REPORT_IDFROM table_1
;

The problem is that I want to remove duplicates by col1 and col2, but also want to include to the insert all the other fields of table_1.

I tried to add all the relevant columns this way:

INSERT IGNORE INTO temp(M_ID,MAILING_ID,REPORT_ID,
MAILING_NAME,VISIBILITY,EXPORTED) SELECT DISTINCT  
M_ID,MAILING_ID,REPORT_ID,MAILING_NAME,VISIBILITY,
EXPORTED FROM table_1
;


M_ID(int,primary),MAILING_ID(int),REPORT_ID(int),
MAILING_NAME(varchar),VISIBILITY(varchar),EXPORTED(int)

But it inserted all rows into temp (including duplicates)

Well for one thing -- do not use INSERT IGNORE in your case, 2nd --> How is your db table set up? — Naftali
– Naftali, Commented Jan 15, 2013 at 15:19
@Neal updated my question with the actual field names and types — user838437
– user838437, Commented Jan 15, 2013 at 15:44

guidod · Accepted Answer · 2015-03-13 20:20:35Z

40

The best way to delete duplicate rows by multiple columns is the simplest one:

Add an UNIQUE index:

ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);

The IGNORE above makes sure that only the first found row is kept, the rest discarded.

(You can then drop that index if you need future duplicates and/or know they won't happen again).

answered Mar 13, 2015 at 20:20

guidod

1,03610 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

larrylampco Over a year ago

MUCH easier than correlated subqueries.

ianaz Over a year ago

As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.

DanJGer Over a year ago

in mysql 5.5 there is a bug that can be present. use set old_alter_table=1 see docs at: dev.mysql.com/doc/refman/5.5/en/alter-table.html Due to a bug related to Fast Index Creation (Bug #40344), ALTER IGNORE TABLE ... ADD UNIQUE INDEX does not delete duplicate rows. The IGNORE keyword is ignored. If any duplicate rows exist, the operation fails with a Duplicate entry error. A workaround is to set old_alter_table=1 prior to running an ALTER IGNORE TABLE ... ADD UNIQUE INDEX statement.

freddy888 Over a year ago

How would this works if I want to modify one column first. For example this is not working: ALTER IGNORE TABLE mytable ADD UNIQUE (FROM_UNIXTIME(CEIL(UNIX_TIMESTAMP(timestamp) / 5) * 5), id2)

Derek Gogol Over a year ago

ALTER IGNORE has been deprecated

LStarky · Accepted Answer · 2019-01-30 18:40:57Z

27

This works perfectly in any version of MySQL including 5.7+. It also handles the error You can't specify target table 'my_table' for update in FROM clause by using a double-nested subquery. It only deletes ONE duplicate row (the later one) so if you have 3 or more duplicates, you can run the query multiple times. It never deletes unique rows.

DELETE FROM my_table
WHERE id IN (
  SELECT calc_id FROM (
    SELECT MAX(id) AS calc_id
    FROM my_table
    GROUP BY identField1, identField2
    HAVING COUNT(id) > 1
  ) temp
)

I needed this query because I wanted to add a UNIQUE index on two columns but there were some duplicate rows that I needed to discard first.

answered Jan 30, 2019 at 18:40

LStarky

2,7871 gold badge23 silver badges49 bronze badges

3 Comments

Umair Ayub Over a year ago

You can't specify target table 'table' for update in FROM clause

LStarky Over a year ago

It works since the WHERE clause uses double nesting. That's the magic that tricks the MySQL engine into allowing this query without creating a conflict.

Tadas V. Over a year ago

keep in mind it will remove ONLY 1 duplicate if 1+ exists.

stevex · Accepted Answer · 2020-02-25 20:02:13Z

17

For Mysql:

DELETE t1 FROM yourtable t1 
  INNER JOIN yourtable t2 WHERE t1.id < t2.id 
    AND t1.identField1 = t2.identField1 
    AND t1.identField2 = t2.identField2;

edited Feb 25, 2020 at 20:02

stevex

5,84742 silver badges60 bronze badges

answered Jul 27, 2018 at 6:50

Shashikant Sharma

5565 silver badges10 bronze badges

Comments

Scotch · Accepted Answer · 2013-01-15 15:51:23Z

7

You will first need to find your duplicates by grouping on the two fields with a having clause.

    Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1

If this returns what you want, you can then use it as a subquery and

  DELETE FROM yourTable WHERE field in (Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1 )

answered Jan 15, 2013 at 15:51

Scotch

3,23411 gold badges41 silver badges51 bronze badges

3 Comments

user838437 Over a year ago

Will this keep one of the duplicates rows? (I want to keep one, not delete any row that has a duplicate)

Scotch Over a year ago

It will remove all of the duplicates. If you want to keep one, you can select a max or min of a field you aren't aggregating on. A quick google turned up stackoverflow.com/questions/3777633/… which also links to other identical questions.

CMCDragonkai Over a year ago

What if the table only has 2 columns and both columns are being grouped, how do I prevent deleting all duplicates?

Sudhanshu Jain · Accepted Answer · 2017-05-30 09:34:08Z

1

you can always get the primary ids by grouping that two unique fields

select count(*), id as count from table group by col a, col b having count(*)>1;

and then

delete from table where id in ( select count(*), id as count from table group by col a, col b having count(*)>1) limit maxlimit;

you can also use max() in place of limit

edited May 30, 2017 at 9:34

answered May 30, 2017 at 9:26

Sudhanshu Jain

112 bronze badges

4 Comments

Miguel Stevens Over a year ago

what does the limit maxlimit do?

Sudhanshu Jain Over a year ago

@Notflip that refers to how many duplicate rows you want to delete

didil Over a year ago

you cannot use the same table for the nested query and the delete query.

Prabhu Nandan Kumar Over a year ago

This is not working @SudhanshuJain, have you tested this??

Bahadir Tasdemir · Accepted Answer · 2019-02-04 08:48:58Z

NOTE: This solution is an alternative & old school solution.

If you couldn't achieve what you wanted, then you can try my "oldschool" method:

First, run this query to get the duplicate records:

select   column1,
         column2,
         count(*)
from     table
group by column1,
         column2
having   count(*) > 1
order by count(*) desc

After that, select those results and paste them into the notepad++:

Now by using the find and replace specialty of the notepad++ replace them with; first "delete" then "insert" queries like this (from now on, for security reasons, my values will be AAAA).

Special Note: Please make another new line for the end of the last line of your data inside notepad++ because regex matched the '\r\n' at the end of the each line:

Find what regex: \D*(\d+)\D*(\d+)\D*\r\n

Replace with string: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n

Now finally, paste those queries to your MySQL Workbench's query console and execute. You will see only one occurrences of each duplicate record.

This answer is for a relation table constructed of just two columns without ID. I think you can apply it to your situation.

Dharman · Accepted Answer · 2020-01-09 23:02:54Z

1

In a large data set if you are selecting the multiple columns in the select clause ex: select x,y,z from table1. And the requirement is to remove duplicate based on two columns:from above example let y,z then you may use below instead of using combo of "group by" and "sub query", which is bad in performance:

select x,y,z 
from (
select x,y,z , row_number() over (partition by y,z) as index_num
from table1) main
where main.index_num=1

edited Jan 9, 2020 at 23:02

Dharman♦

33.9k27 gold badges106 silver badges157 bronze badges

answered Jan 9, 2020 at 22:15

Govind

111 bronze badge

Comments

Dragonduck · Accepted Answer · 2024-12-18 12:31:56Z

0

Building on @LStarky 's answer I made snippet to remove all duplicates except the oldest (lowest ID) one.

DELETE my_table FROM my_table
  JOIN (
    SELECT MIN(id) AS calc_id, identField1, identField2
    FROM my_table
    GROUP BY identField1, identField2
    HAVING COUNT(id) > 1
    ) sub ON sub.calc_id != my_table.id 
       AND sub.identField1 = my_table.identField1
       AND sub.identField2 = my_table.identField2

answered Dec 18, 2024 at 12:31

Dragonduck

3112 silver badges7 bronze badges

Collectives™ on Stack Overflow

Find and remove duplicate rows by two columns

8 Answers 8

5 Comments

3 Comments

Comments

3 Comments

4 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

5 Comments

3 Comments

Comments

3 Comments

4 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related