How to delete Duplicates in MySQL table

Question

I've given a client the following query to delete duplicate phone no. records in an MSSQL database, but now they need to also do it on MySQL, and they report that MySQL complains about the format of the query. I've included the setup of a test table with duplicates for my code sample, but the actual delete query is what counts.

I'm asking this in ignorance and urgency, as I am still busy downloading and installing MySQL, and just maybe somebody can help in the mean time.

 create table bkPhone
 (
     phoneNo nvarchar(20),
     firstName nvarchar(20),
     lastName nvarchar(20)
 )
 GO

 insert bkPhone values('0783313780','Brady','Kelly')
 insert bkPhone values('0845319792','Mark','Smith')
 insert bkPhone values('0834976958','Bill','Jones')
 insert bkPhone values('0845319792','Mark','Smith')
 insert bkPhone values('0828329792','Mickey','Mouse')
 insert bkPhone values('0834976958','Bill','Jones')

 alter table bkPhone add phoneId int identity

 delete from bkPhone
 where phoneId not in
 (
     select min(phoneId)
     from bkPhone
     group by phoneNo,firstName,lastName
     having  count(*) >= 1
 )

Looks fine to me. Are they using a version of MySQL that supports subqueries? — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Mar 23, 2009 at 9:39

Tom Schaefer · Accepted Answer · 2009-03-29 02:02:01Z

14

Many ways lead to Rome. This is one. It is very fast. So you can use it with big databases. Don't forget the indeces. The trick is: make phoneNo unique and use "ignore".

drop table if exists bkPhone_template;
create table bkPhone_template (
         phoneNo varchar(20),
         firstName varchar(20),
         lastName varchar(20)
 );

insert into bkPhone_template values('0783313780','Brady','Kelly');
 insert into bkPhone_template values('0845319792','Mark','Smith');
 insert into bkPhone_template values('0834976958','Bill','Jones');
 insert into bkPhone_template values('0845319792','Mark','Smith');
 insert into bkPhone_template values('0828329792','Mickey','Mouse');
 insert into bkPhone_template values('0834976958','Bill','Jones');

drop table if exists bkPhone;
create table bkPhone like bkPhone_template;
alter table bkPhone add unique (phoneNo);

insert  ignore into bkPhone (phoneNo,firstName,lastName) select phoneNo,firstName,lastName from bkPhone_template;

drop table bkPhone_template;

If the data table already exists, then you only have to run a create table select with a following insert ignore select. At the end you have to run some table renaming statements. That's all.

This workaround is much,much faster then a delete operation.

edited Mar 29, 2009 at 2:02

answered Mar 29, 2009 at 1:50

Tom Schaefer

8971 gold badge7 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ProfK Over a year ago

Thanks for a two part education on MySQL. I now have 'like' for create table, and 'ignore' in my arsenal.

Wrikken Over a year ago

Note BTW (comment long after answer due to duplicate linking) that you can just use ALTER IGNORE TABLE ADD UNIQUE(phoneNo), which silently jsut discards the duplicates in the table without having to create another one.

karim79 · Accepted Answer · 2009-03-23 09:42:32Z

5

You can select out the unique ones by:

select distinct(phoneNo) from bkPhone

and put them into another table, delete the old table and rename the new one to the old name.

answered Mar 23, 2009 at 9:42

karim79

343k67 gold badges420 silver badges409 bronze badges

1 Comment

ProfK Over a year ago

Given the simplicity of the scenario, allowing for new and dropped tables, this was the simplest. most effective solution. Thanks.

vartec · Accepted Answer · 2009-03-23 10:28:29Z

2

MySQL complains, because it makes no sense. You trying to aggregate using min() column by which you group.

Now, if you're trying to delete duplicate phone numbers for the same person, the SQL should be:

delete from bkPhone
 where phoneId not in
 (
         select min(phoneId)
         from bkPhone
         group by firstName,lastName /* i.e. grouping by person and NOT grouping by phoneId */
         having  count(*) >= 1
 )

answered Mar 23, 2009 at 10:28

vartec

135k38 gold badges227 silver badges248 bronze badges

4 Comments

Lukas Eder Over a year ago

Awesome. Yet another use case for that lovely having clause :-) But I think you should write >, instead of >=. That might accelerate things

MvG Over a year ago

I don't see the point of this having at all. As it stands, it should always be true, so it is pointless. And if you change it to >1, then rows without duplicates would be omitted from the inner select and thus removed by the outer delete. Not what you'd want, I believe.

vartec Over a year ago

@LukasEder: with > that would delete all entries, that have only single phone. I think you didn't notice that there is not in the condition.

Jaak Kütt Over a year ago

"#1093 - You can't specify target table 'bkPhone' for update in FROM clause" says SQL Fiddle

Michael Buen · Accepted Answer · 2009-03-23 09:42:21Z

1

Mysql also included:

http://mssql-to-postgresql.blogspot.com/2007/12/deleting-duplicates-in-postgresql-ms.html

answered Mar 23, 2009 at 9:42

Michael Buen

39.6k10 gold badges99 silver badges120 bronze badges

Collectives™ on Stack Overflow

How to delete Duplicates in MySQL table

4 Answers 4

2 Comments

1 Comment

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related