26

I read all the relevant duplicated questions/answers and I found this to be the most relevant answer:

INSERT IGNORE INTO temp(MAILING_ID,REPORT_ID) 
SELECT DISTINCT MAILING_ID,REPORT_IDFROM table_1
;

The problem is that I want to remove duplicates by col1 and col2, but also want to include to the insert all the other fields of table_1.

I tried to add all the relevant columns this way:

INSERT IGNORE INTO temp(M_ID,MAILING_ID,REPORT_ID,
MAILING_NAME,VISIBILITY,EXPORTED) SELECT DISTINCT  
M_ID,MAILING_ID,REPORT_ID,MAILING_NAME,VISIBILITY,
EXPORTED FROM table_1
;


M_ID(int,primary),MAILING_ID(int),REPORT_ID(int),
MAILING_NAME(varchar),VISIBILITY(varchar),EXPORTED(int)

But it inserted all rows into temp (including duplicates)

3
  • 2
    Well for one thing -- do not use INSERT IGNORE in your case, 2nd --> How is your db table set up? Commented Jan 15, 2013 at 15:19
  • can you give sample records? Commented Jan 15, 2013 at 15:29
  • @Neal updated my question with the actual field names and types Commented Jan 15, 2013 at 15:44

8 Answers 8

40

The best way to delete duplicate rows by multiple columns is the simplest one:

Add an UNIQUE index:

ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);

The IGNORE above makes sure that only the first found row is kept, the rest discarded.

(You can then drop that index if you need future duplicates and/or know they won't happen again).

Sign up to request clarification or add additional context in comments.

5 Comments

MUCH easier than correlated subqueries.
As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.
in mysql 5.5 there is a bug that can be present. use set old_alter_table=1 see docs at: dev.mysql.com/doc/refman/5.5/en/alter-table.html Due to a bug related to Fast Index Creation (Bug #40344), ALTER IGNORE TABLE ... ADD UNIQUE INDEX does not delete duplicate rows. The IGNORE keyword is ignored. If any duplicate rows exist, the operation fails with a Duplicate entry error. A workaround is to set old_alter_table=1 prior to running an ALTER IGNORE TABLE ... ADD UNIQUE INDEX statement.
How would this works if I want to modify one column first. For example this is not working: ALTER IGNORE TABLE mytable ADD UNIQUE (FROM_UNIXTIME(CEIL(UNIX_TIMESTAMP(timestamp) / 5) * 5), id2)
ALTER IGNORE has been deprecated
27

This works perfectly in any version of MySQL including 5.7+. It also handles the error You can't specify target table 'my_table' for update in FROM clause by using a double-nested subquery. It only deletes ONE duplicate row (the later one) so if you have 3 or more duplicates, you can run the query multiple times. It never deletes unique rows.

DELETE FROM my_table
WHERE id IN (
  SELECT calc_id FROM (
    SELECT MAX(id) AS calc_id
    FROM my_table
    GROUP BY identField1, identField2
    HAVING COUNT(id) > 1
  ) temp
)

I needed this query because I wanted to add a UNIQUE index on two columns but there were some duplicate rows that I needed to discard first.

3 Comments

You can't specify target table 'table' for update in FROM clause
It works since the WHERE clause uses double nesting. That's the magic that tricks the MySQL engine into allowing this query without creating a conflict.
keep in mind it will remove ONLY 1 duplicate if 1+ exists.
17

For Mysql:

DELETE t1 FROM yourtable t1 
  INNER JOIN yourtable t2 WHERE t1.id < t2.id 
    AND t1.identField1 = t2.identField1 
    AND t1.identField2 = t2.identField2;

Comments

7

You will first need to find your duplicates by grouping on the two fields with a having clause.

    Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1

If this returns what you want, you can then use it as a subquery and

  DELETE FROM yourTable WHERE field in (Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1 )

3 Comments

Will this keep one of the duplicates rows? (I want to keep one, not delete any row that has a duplicate)
It will remove all of the duplicates. If you want to keep one, you can select a max or min of a field you aren't aggregating on. A quick google turned up stackoverflow.com/questions/3777633/… which also links to other identical questions.
What if the table only has 2 columns and both columns are being grouped, how do I prevent deleting all duplicates?
1

you can always get the primary ids by grouping that two unique fields

select count(*), id as count from table group by col a, col b having count(*)>1;

and then

delete from table where id in ( select count(*), id as count from table group by col a, col b having count(*)>1) limit maxlimit;

you can also use max() in place of limit

4 Comments

what does the limit maxlimit do?
@Notflip that refers to how many duplicate rows you want to delete
you cannot use the same table for the nested query and the delete query.
This is not working @SudhanshuJain, have you tested this??
1

NOTE: This solution is an alternative & old school solution.


If you couldn't achieve what you wanted, then you can try my "oldschool" method:

First, run this query to get the duplicate records:

select   column1,
         column2,
         count(*)
from     table
group by column1,
         column2
having   count(*) > 1
order by count(*) desc

After that, select those results and paste them into the notepad++:

select query paste onto notepad

Now by using the find and replace specialty of the notepad++ replace them with; first "delete" then "insert" queries like this (from now on, for security reasons, my values will be AAAA).

Special Note: Please make another new line for the end of the last line of your data inside notepad++ because regex matched the '\r\n' at the end of the each line:

enter image description here

Find what regex: \D*(\d+)\D*(\d+)\D*\r\n

Replace with string: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n

Now finally, paste those queries to your MySQL Workbench's query console and execute. You will see only one occurrences of each duplicate record.

enter image description here

This answer is for a relation table constructed of just two columns without ID. I think you can apply it to your situation.

Comments

1

In a large data set if you are selecting the multiple columns in the select clause ex: select x,y,z from table1. And the requirement is to remove duplicate based on two columns:from above example let y,z then you may use below instead of using combo of "group by" and "sub query", which is bad in performance:

select x,y,z 
from (
select x,y,z , row_number() over (partition by y,z) as index_num
from table1) main
where main.index_num=1

Comments

0

Building on @LStarky 's answer I made snippet to remove all duplicates except the oldest (lowest ID) one.

DELETE my_table FROM my_table
  JOIN (
    SELECT MIN(id) AS calc_id, identField1, identField2
    FROM my_table
    GROUP BY identField1, identField2
    HAVING COUNT(id) > 1
    ) sub ON sub.calc_id != my_table.id 
       AND sub.identField1 = my_table.identField1
       AND sub.identField2 = my_table.identField2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.