2

I have a table with this structure :

create table content_relations (
    mainConId    Integer not null,
    relatedConId Integer not null,
    similarity   float not null,
    relatedConAddedOn TIMESTAMP WITH TIME ZONE Not null);

Now I want to have a query by which I should be able to delete rows from this table with these conditions :

delete rows where count of same mainConId exceeds a max limit(say CMax), 
i.e only keep CMax items per mainConId and that too with sorting according to similarity desc
i.e. keep only CMax items which are most similar in this table and remove extraneous  records for every mainConId

So after running this query I should have at-most n*CMax rows in this table where n is number of unique mainConId.

Can someone help me with the query ? I think it should be possible to do with postgres. Thanks in adv.

3
  • @komenten No Its for my own side project, I have a table where size can increase with n*n so I need a way to control the number of rows present in the table Commented Nov 8, 2021 at 12:53
  • OK. Can you please include the query you have tried so far, and any error-message you may get? Commented Nov 8, 2021 at 13:05
  • don't know how to express my requirement in query, I'm not a sql expert :) Commented Nov 8, 2021 at 13:07

2 Answers 2

2

First you can try this :

WITH list AS
(
SELECT *
     , row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
  FROM content_relations 
)
SELECT *
  FROM list AS l
 WHERE row_number <= CMax

Then if the result corresponds to the rows you want to keep, you can delete the extra rows with :

WITH list AS
(
SELECT mainConId
     , similarity
     , row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
  FROM content_relations 
)
DELETE FROM content_relations AS cr
 USING list AS l
 WHERE cr.mainConId = l.mainConId
   AND cr.similarity = l.similarity
   AND l.row_number > CMax
Sign up to request clarification or add additional context in comments.

3 Comments

Although first part gave correct records which are to be kept but the deletion part deleted wrong rows which should be kept, specifically it deleted all records of mainConId for which row_number > CMax . Ideally we need to delete only those records which extend the limit and keep limited.
ok, I have update the WHERE clause in the delete statement so that to better select the rows to be deleted from the list cte.
still need some modifications which I reached based on your Answer. Thanks a lot for helping
0

Based on @Edouard H.'s Answer I reached to the solution point :

WITH list AS
(
SELECT mainConId, relatedConId 
     , row_number() OVER (PARTITION BY mainConId ORDER BY similarity DESC) AS row_number
  FROM content_relations 
)
DELETE FROM content_relations AS cr
 USING list AS l
 WHERE cr.mainConId = l.mainConId
   AND cr.relatedConId = l.relatedConId
   AND l.row_number > CMax;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.