How to drop duplicate rows from postgresql sql table

Question


date        | window  | points  |    actual_bool      |         previous_bool          |       creation_time        | source 
------------+---------+---------+---------------------+---------------------------------+----------------------------+--------
 2021-02-11 |     110 |     0.6 |                   0 |                               0 | 2021-02-14 09:20:57.51966  | bldgh
 2021-02-11 |     150 |     0.7 |                   1 |                               0 | 2021-02-14 09:20:57.51966  | fiata
 2021-02-11 |     110 |     0.7 |                   1 |                               0 | 2021-02-14 09:20:57.51966  | nfiws
 2021-02-11 |     150 |     0.7 |                   1 |                               0 | 2021-02-14 09:20:57.51966  | fiata
 2021-02-11 |     110 |     0.6 |                   0 |                               0 | 2021-02-14 09:20:57.51966  | bldgh
 2021-02-11 |     110 |     0.3 |                   0 |                               1 | 2021-02-14 09:22:22.969014 | asdg1
 2021-02-11 |     110 |     0.6 |                   0 |                               0 | 2021-02-14 09:22:22.969014 | j
 2021-02-11 |     110 |     0.3 |                   0 |                               1 | 2021-02-14 09:22:22.969014 | aba
 2021-02-11 |     110 |     0.5 |                   0 |                               1 | 2021-02-14 09:22:22.969014 | fg
 2021-02-11 |     110 |     0.6 |                   1 |                               0 | 2021-02-14 09:22:22.969014 | wdda
 2021-02-11 |     110 |     0.7 |                   1 |                               1 | 2021-02-14 09:23:21.977685 | dda
 2021-02-11 |     110 |     0.5 |                   1 |                               0 | 2021-02-14 09:23:21.977685 | dd
 2021-02-11 |     110 |     0.6 |                   1 |                               1 | 2021-02-14 09:23:21.977685 | so
 2021-02-11 |     110 |     0.5 |                   1 |                               1 | 2021-02-14 09:23:21.977685 | dar
 2021-02-11 |     110 |     0.6 |                   1 |                               1 | 2021-02-14 09:23:21.977685 | firr
 2021-02-11 |     110 |     0.8 |                   1 |                               1 | 2021-02-14 09:24:15.831411 | xim
 2021-02-11 |     110 |     0.8 |                   1 |                               1 | 2021-02-14 09:24:15.831411 | cxyy
 2021-02-11 |     110 |     0.3 |                   0 |                               1 | 2021-02-14 09:24:15.831411 | bisd
 2021-02-11 |     110 |     0.1 |                   0 |                               1 | 2021-02-14 09:24:15.831411 | cope
 2021-02-11 |     110 |     0.2 |                   0 |                               1 | 2021-02-14 09:24:15.831411 | sand
 ...

I have the following dataset in a postgresql table called testtable in testdb.

I have accidentally copied over the database and duplicated rows.

How can I delete the duplicates?

Row 1 and row 5 are copies in this frame and row 2 and row 4 are copies too.

I have never used sql before to drop duplicates I have no idea where to start.

I tried

select creation_time, count(creation_time) from classification group by creation_time having count (creation_time)>1 order by source;

But all it did was show me howmany duplicates I had in each day,

Like this

       creation_time        | count 
----------------------------+-------
 2021-02-14 09:20:57.51966  |    10
 2021-02-14 09:22:22.969014 |    10
 2021-02-14 09:23:21.977685 |    10
 2021-02-14 09:24:15.831411 |    10
 2021-02-14 09:24:27.733763 |    10
 2021-02-14 09:24:38.41793  |    10
 2021-02-14 09:27:04.432466 |    10
 2021-02-14 09:27:21.62256  |    10
 2021-02-14 09:27:22.677763 |    10
 2021-02-14 09:27:37.996054 |    10
 2021-02-14 09:28:09.275041 |    10
 2021-02-14 09:28:22.649391 |    10
...

There should only be 5 unique records in each creation_timestamp.

It doesnt show me the duplicates and even if i did it would have no idea how to drop them.

Every table should have such a column, e.g. for identifying specific records... — S-Man
– S-Man, Commented Mar 16, 2021 at 16:23
my first time doing sql, i didnt realise, is there anyway i can fix it now and add an id column after removing the duplicates? — anarchy
– anarchy, Commented Mar 16, 2021 at 16:25
Add in an id column BEFORE removing what be much more helpful. Because it is quite easy to identify the duplicates, but its harder to remove them if you cannot adress them properly... — S-Man
– S-Man, Commented Mar 16, 2021 at 16:27

Gordon Linoff · Accepted Answer · 2021-03-16 16:27:15Z

6

That is a lot of rows to delete. I would suggest just recreating the table:

create table new_classification as
    select distinct c.*
    from classification c;

After you have validated the data, you can reload it if you really want:

truncate table classification;

insert into classification
    select *
    from new_classification;

This process should be much faster than deleting 90% of the rows.

answered Mar 16, 2021 at 16:27

Gordon Linoff

1.3m62 gold badges705 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

S-Man Over a year ago

This is a nice idea! :)

anarchy Over a year ago

omg thank you so much, it worked, you saved me so much time. could you explain what the select distinct c.* from classification c; of the command does?

S-Man Over a year ago

SELECT DISTINCT * returns your table content without duplicates

anarchy Over a year ago

then what does the c.* and c do?

Gordon Linoff Over a year ago

@anarchy . . . The c is a table alias. The c.* returns all columns from the whatever c refers to.

|

Collectives™ on Stack Overflow

How to drop duplicate rows from postgresql sql table

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related