Delete and keep certain rows based on conditions from table in postgresql

Question

I have a table with this structure :

create table content_relations (
    mainConId    Integer not null,
    relatedConId Integer not null,
    similarity   float not null,
    relatedConAddedOn TIMESTAMP WITH TIME ZONE Not null);

Now I want to have a query by which I should be able to delete rows from this table with these conditions :

delete rows where count of same mainConId exceeds a max limit(say CMax), 
i.e only keep CMax items per mainConId and that too with sorting according to similarity desc
i.e. keep only CMax items which are most similar in this table and remove extraneous  records for every mainConId

So after running this query I should have at-most n*CMax rows in this table where n is number of unique mainConId.

Can someone help me with the query ? I think it should be possible to do with postgres. Thanks in adv.

@komenten No Its for my own side project, I have a table where size can increase with n*n so I need a way to control the number of rows present in the table — Deepak Kapiswe
– Deepak Kapiswe, Commented Nov 8, 2021 at 12:53
OK. Can you please include the query you have tried so far, and any error-message you may get? — kometen
– kometen, Commented Nov 8, 2021 at 13:05
don't know how to express my requirement in query, I'm not a sql expert :) — Deepak Kapiswe
– Deepak Kapiswe, Commented Nov 8, 2021 at 13:07

Edouard · Accepted Answer · 2021-11-08 14:21:32Z

2

First you can try this :

WITH list AS
(
SELECT *
     , row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
  FROM content_relations 
)
SELECT *
  FROM list AS l
 WHERE row_number <= CMax

Then if the result corresponds to the rows you want to keep, you can delete the extra rows with :

WITH list AS
(
SELECT mainConId
     , similarity
     , row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
  FROM content_relations 
)
DELETE FROM content_relations AS cr
 USING list AS l
 WHERE cr.mainConId = l.mainConId
   AND cr.similarity = l.similarity
   AND l.row_number > CMax

edited Nov 8, 2021 at 14:21

answered Nov 8, 2021 at 13:10

Edouard

7,1291 gold badge11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Deepak Kapiswe Over a year ago

Although first part gave correct records which are to be kept but the deletion part deleted wrong rows which should be kept, specifically it deleted all records of mainConId for which row_number > CMax . Ideally we need to delete only those records which extend the limit and keep limited.

Edouard Over a year ago

ok, I have update the WHERE clause in the delete statement so that to better select the rows to be deleted from the list cte.

Deepak Kapiswe Over a year ago

still need some modifications which I reached based on your Answer. Thanks a lot for helping

Deepak Kapiswe · Accepted Answer · 2021-11-08 14:19:57Z

0

Based on @Edouard H.'s Answer I reached to the solution point :

WITH list AS
(
SELECT mainConId, relatedConId 
     , row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
  FROM content_relations 
)
DELETE FROM content_relations AS cr
 USING list AS l
 WHERE cr.mainConId = l.mainConId
   AND cr.relatedConId = l.relatedConId
   AND l.row_number > CMax;

answered Nov 8, 2021 at 14:19

Deepak Kapiswe

10810 bronze badges

Collectives™ on Stack Overflow

Delete and keep certain rows based on conditions from table in postgresql

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related