2

date        | window  | points  |    actual_bool      |         previous_bool          |       creation_time        | source 
------------+---------+---------+---------------------+---------------------------------+----------------------------+--------
 2021-02-11 |     110 |     0.6 |                   0 |                               0 | 2021-02-14 09:20:57.51966  | bldgh
 2021-02-11 |     150 |     0.7 |                   1 |                               0 | 2021-02-14 09:20:57.51966  | fiata
 2021-02-11 |     110 |     0.7 |                   1 |                               0 | 2021-02-14 09:20:57.51966  | nfiws
 2021-02-11 |     150 |     0.7 |                   1 |                               0 | 2021-02-14 09:20:57.51966  | fiata
 2021-02-11 |     110 |     0.6 |                   0 |                               0 | 2021-02-14 09:20:57.51966  | bldgh
 2021-02-11 |     110 |     0.3 |                   0 |                               1 | 2021-02-14 09:22:22.969014 | asdg1
 2021-02-11 |     110 |     0.6 |                   0 |                               0 | 2021-02-14 09:22:22.969014 | j
 2021-02-11 |     110 |     0.3 |                   0 |                               1 | 2021-02-14 09:22:22.969014 | aba
 2021-02-11 |     110 |     0.5 |                   0 |                               1 | 2021-02-14 09:22:22.969014 | fg
 2021-02-11 |     110 |     0.6 |                   1 |                               0 | 2021-02-14 09:22:22.969014 | wdda
 2021-02-11 |     110 |     0.7 |                   1 |                               1 | 2021-02-14 09:23:21.977685 | dda
 2021-02-11 |     110 |     0.5 |                   1 |                               0 | 2021-02-14 09:23:21.977685 | dd
 2021-02-11 |     110 |     0.6 |                   1 |                               1 | 2021-02-14 09:23:21.977685 | so
 2021-02-11 |     110 |     0.5 |                   1 |                               1 | 2021-02-14 09:23:21.977685 | dar
 2021-02-11 |     110 |     0.6 |                   1 |                               1 | 2021-02-14 09:23:21.977685 | firr
 2021-02-11 |     110 |     0.8 |                   1 |                               1 | 2021-02-14 09:24:15.831411 | xim
 2021-02-11 |     110 |     0.8 |                   1 |                               1 | 2021-02-14 09:24:15.831411 | cxyy
 2021-02-11 |     110 |     0.3 |                   0 |                               1 | 2021-02-14 09:24:15.831411 | bisd
 2021-02-11 |     110 |     0.1 |                   0 |                               1 | 2021-02-14 09:24:15.831411 | cope
 2021-02-11 |     110 |     0.2 |                   0 |                               1 | 2021-02-14 09:24:15.831411 | sand
 ...

I have the following dataset in a postgresql table called testtable in testdb.

I have accidentally copied over the database and duplicated rows.

How can I delete the duplicates?

Row 1 and row 5 are copies in this frame and row 2 and row 4 are copies too.

I have never used sql before to drop duplicates I have no idea where to start.

I tried

select creation_time, count(creation_time) from classification group by creation_time having count (creation_time)>1 order by source;

But all it did was show me howmany duplicates I had in each day,

Like this

       creation_time        | count 
----------------------------+-------
 2021-02-14 09:20:57.51966  |    10
 2021-02-14 09:22:22.969014 |    10
 2021-02-14 09:23:21.977685 |    10
 2021-02-14 09:24:15.831411 |    10
 2021-02-14 09:24:27.733763 |    10
 2021-02-14 09:24:38.41793  |    10
 2021-02-14 09:27:04.432466 |    10
 2021-02-14 09:27:21.62256  |    10
 2021-02-14 09:27:22.677763 |    10
 2021-02-14 09:27:37.996054 |    10
 2021-02-14 09:28:09.275041 |    10
 2021-02-14 09:28:22.649391 |    10
...

There should only be 5 unique records in each creation_timestamp.

It doesnt show me the duplicates and even if i did it would have no idea how to drop them.

6
  • Do you have an id column? Commented Mar 16, 2021 at 16:21
  • no i dont, should i have made one? Commented Mar 16, 2021 at 16:21
  • Every table should have such a column, e.g. for identifying specific records... Commented Mar 16, 2021 at 16:23
  • my first time doing sql, i didnt realise, is there anyway i can fix it now and add an id column after removing the duplicates? Commented Mar 16, 2021 at 16:25
  • Add in an id column BEFORE removing what be much more helpful. Because it is quite easy to identify the duplicates, but its harder to remove them if you cannot adress them properly... Commented Mar 16, 2021 at 16:27

1 Answer 1

6

That is a lot of rows to delete. I would suggest just recreating the table:

create table new_classification as
    select distinct c.*
    from classification c;

After you have validated the data, you can reload it if you really want:

truncate table classification;

insert into classification
    select *
    from new_classification;

This process should be much faster than deleting 90% of the rows.

Sign up to request clarification or add additional context in comments.

8 Comments

This is a nice idea! :)
omg thank you so much, it worked, you saved me so much time. could you explain what the select distinct c.* from classification c; of the command does?
SELECT DISTINCT * returns your table content without duplicates
then what does the c.* and c do?
@anarchy . . . The c is a table alias. The c.* returns all columns from the whatever c refers to.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.