7

I've given a client the following query to delete duplicate phone no. records in an MSSQL database, but now they need to also do it on MySQL, and they report that MySQL complains about the format of the query. I've included the setup of a test table with duplicates for my code sample, but the actual delete query is what counts.

I'm asking this in ignorance and urgency, as I am still busy downloading and installing MySQL, and just maybe somebody can help in the mean time.

 create table bkPhone
 (
     phoneNo nvarchar(20),
     firstName nvarchar(20),
     lastName nvarchar(20)
 )
 GO

 insert bkPhone values('0783313780','Brady','Kelly')
 insert bkPhone values('0845319792','Mark','Smith')
 insert bkPhone values('0834976958','Bill','Jones')
 insert bkPhone values('0845319792','Mark','Smith')
 insert bkPhone values('0828329792','Mickey','Mouse')
 insert bkPhone values('0834976958','Bill','Jones')

 alter table bkPhone add phoneId int identity

 delete from bkPhone
 where phoneId not in
 (
     select min(phoneId)
     from bkPhone
     group by phoneNo,firstName,lastName
     having  count(*) >= 1
 )
2
  • 1
    Looks fine to me. Are they using a version of MySQL that supports subqueries? Commented Mar 23, 2009 at 9:39
  • Why having count(*) >= 1 ?? When it is ever NOT? Commented Feb 10, 2011 at 0:55

4 Answers 4

14

Many ways lead to Rome. This is one. It is very fast. So you can use it with big databases. Don't forget the indeces. The trick is: make phoneNo unique and use "ignore".

drop table if exists bkPhone_template;
create table bkPhone_template (
         phoneNo varchar(20),
         firstName varchar(20),
         lastName varchar(20)
 );

insert into bkPhone_template values('0783313780','Brady','Kelly');
 insert into bkPhone_template values('0845319792','Mark','Smith');
 insert into bkPhone_template values('0834976958','Bill','Jones');
 insert into bkPhone_template values('0845319792','Mark','Smith');
 insert into bkPhone_template values('0828329792','Mickey','Mouse');
 insert into bkPhone_template values('0834976958','Bill','Jones');

drop table if exists bkPhone;
create table bkPhone like bkPhone_template;
alter table bkPhone add unique (phoneNo);

insert  ignore into bkPhone (phoneNo,firstName,lastName) select phoneNo,firstName,lastName from bkPhone_template;

drop table bkPhone_template;

If the data table already exists, then you only have to run a create table select with a following insert ignore select. At the end you have to run some table renaming statements. That's all.

This workaround is much,much faster then a delete operation.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for a two part education on MySQL. I now have 'like' for create table, and 'ignore' in my arsenal.
Note BTW (comment long after answer due to duplicate linking) that you can just use ALTER IGNORE TABLE ADD UNIQUE(phoneNo), which silently jsut discards the duplicates in the table without having to create another one.
5

You can select out the unique ones by:

select distinct(phoneNo) from bkPhone

and put them into another table, delete the old table and rename the new one to the old name.

1 Comment

Given the simplicity of the scenario, allowing for new and dropped tables, this was the simplest. most effective solution. Thanks.
2

MySQL complains, because it makes no sense. You trying to aggregate using min() column by which you group.

Now, if you're trying to delete duplicate phone numbers for the same person, the SQL should be:

delete from bkPhone
 where phoneId not in
 (
         select min(phoneId)
         from bkPhone
         group by firstName,lastName /* i.e. grouping by person and NOT grouping by phoneId */
         having  count(*) >= 1
 )

4 Comments

Awesome. Yet another use case for that lovely having clause :-) But I think you should write >, instead of >=. That might accelerate things
I don't see the point of this having at all. As it stands, it should always be true, so it is pointless. And if you change it to >1, then rows without duplicates would be omitted from the inner select and thus removed by the outer delete. Not what you'd want, I believe.
@LukasEder: with > that would delete all entries, that have only single phone. I think you didn't notice that there is not in the condition.
"#1093 - You can't specify target table 'bkPhone' for update in FROM clause" says SQL Fiddle
1

Mysql also included:

http://mssql-to-postgresql.blogspot.com/2007/12/deleting-duplicates-in-postgresql-ms.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.