Delete duplicate rows (don't delete all duplicate)

Question

I am using postgres. I want to delete Duplicate rows. The condition is that , 1 copy from the set of duplicate rows would not be deleted.

i.e : if there are 5 duplicate records then 4 of them will be deleted.

possible duplicate of How to delete duplicate rows with SQL? — Georg Schölly
– Georg Schölly, Commented Sep 23, 2010 at 11:08
how ironic! lol 'possible duplicate of how to delete duplicates'... — Denis Valeev
– Denis Valeev, Commented Feb 24, 2012 at 21:18

Denis Valeev · Accepted Answer · 2012-02-24 21:14:47Z

23

Try the steps described in this article: Removing duplicates from a PostgreSQL database.

It describes a situation when you have to deal with huge amount of data which isn't possible to group by.

A simple solution would be this:

DELETE FROM foo
       WHERE id NOT IN (SELECT min(id) --or max(id)
                        FROM foo
                        GROUP BY hash)

Where hash is something that gets duplicated.

edited Feb 24, 2012 at 21:14

answered Sep 23, 2010 at 11:06

Denis Valeev

6,02537 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

pomarc Over a year ago

doesn't this delete all the rows that don't have duplicates, too?

Denis Valeev Over a year ago

@pomarc no, because there's this little equals sign (=) before 1 that tells us that we want to take min(id) of all possible groups even those that contain only one member; so, no worries, you won't delete data that is not duplicated

grteibo Over a year ago

is the having count (*) >=1 neccesary ? i got the same result if I execute : DELETE FROM foo WHERE id NOT IN (SELECT min(id) FROM foo GROUP BY hash)

Denis Valeev Over a year ago

@grteibo you are absolutely right, that's the way the deduplication is usually done; I don't know why I didn't notice that before; the idea of this answer is not so much the idea of deduplication itself but the fact that we calculate a hash for all the columns that we want to group by and then remove duplicates

adopilot · Accepted Answer · 2010-09-23 11:06:37Z

2

delete from table
where not id in 
(select max(id) from table group by [duplicate row])

This is random (max Value) choice which row you need to keep. If you have aggre whit this please provide more details

answered Sep 23, 2010 at 11:06

adopilot

4,58015 gold badges69 silver badges96 bronze badges

Comments

baklarz2048 · Accepted Answer · 2010-09-23 11:13:13Z

2

The fastest is is join to the same table. http://www.postgresql.org/docs/8.1/interactive/sql-delete.html

CREATE TABLE test(id INT,id2 INT);
CREATE TABLE
mapy=# INSERT INTO test VALUES(1,2);
INSERT 0 1
mapy=# INSERT INTO test VALUES(1,3);
INSERT 0 1
mapy=# INSERT INTO test VALUES(1,4);
INSERT 0 1

DELETE FROM test t1 USING test t2 WHERE t1.id=t2.id AND t1.id2<t2.id2;
DELETE 2
mapy=# SELECT * FROM test;
 id | id2 
----+-----
  1 |   4
(1 row)

answered Sep 23, 2010 at 11:13

baklarz2048

11k2 gold badges33 silver badges38 bronze badges

Comments

Cody Gray · Accepted Answer · 2019-04-30 05:18:17Z

1

delete from table t1 
where rowid > (SELECT min(rowid) FROM table t2 group by 
               t2.id,t2.name );

edited Apr 30, 2019 at 5:18

Cody Gray♦

246k53 gold badges512 silver badges591 bronze badges

answered Apr 30, 2019 at 5:00

Sree Gottumukkala

212 bronze badges

Comments

Xpie · Accepted Answer · 2022-09-24 01:16:16Z

0

DELETE f1 from foo as f1, foo as f2 
       where f1.duplicate_column= f2.duplicate_column
             AND f1.id > f2.id;

answered Sep 24, 2022 at 1:16

Xpie

436 bronze badges

Collectives™ on Stack Overflow

Delete duplicate rows (don't delete all duplicate)

5 Answers 5

4 Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related