0

I am using the below to delete duplicate rows in my table as shown:

with a as(
(select id, date_published from public.dat order by 1)
except all
select distinct on (id, date_published)
id, date_published from public.dat order by 1)
delete from public.dat using a where dat.id = a.id returning *;

So, that might identify duplicate values like the below:

enter code here
ID     DATE_PUBLISHED          DATE_LAST_MODIFIED
007    2019-12-11 13:15:00     2019-09-19 13:40:00
007    2019-12-11 13:15:00     2022-07-11 21:15:00

However I actually don't want to delete everything that is returned.

I only want to delete the rows that have the oldest last modified date.

In this case, I want to delete:

    ID     DATE_PUBLISHED          DATE_LAST_MODIFIED
    007    2019-12-11 13:15:00     2019-09-19 13:40:00

or any other duplicates keep the most recent row.

The SQL I've used goes through the whole table and and deletes all duplicates, which is not my goal.

Can anyone help me with a means by which I can perhaps "tweak" this so it's smart enough to identify the max DATE_LAST_MODIFIED and delete anything else that has the same / duplicate ID?

4
  • you can use rank ,rownumber with partition , after that use rank or order not equal to 1. look into stackoverflow.com/questions/35608330/… Commented Sep 12, 2022 at 5:18
  • I have looked at rank on other questions - but don't understand it. That link also does not reference "rank" thank you Commented Sep 12, 2022 at 5:27
  • what you dont understand from the link? Commented Sep 12, 2022 at 5:32
  • I said I do not understand "rank" - I did not say I did not "...understand from the link" I am saying rank is something I do not understand AND I am also saying, separate to that, that the link you gave me telling me to look at "rank" does not actually make any reference to "rank" anyway. Thank you. Commented Sep 12, 2022 at 5:36

1 Answer 1

2

Try the following:

DELETE FROM
  dat T USING dat D
  WHERE T.id = D.id AND
        T.date_last_modified < D.date_last_modified

See a demo.

You may also use EXISTS as the following:

DELETE FROM
dat T WHERE EXISTS (SELECT 1 FROM dat D
                    WHERE T.id = D.id AND
                          T.date_last_modified < D.date_last_modified
                   )

See a demo.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.