SQL Query - delete rows with duplicate column value

Question

I need to be able to remove some rows in a table where two-column combination have the same value. For example, in below sample table, there should be only one combination of (48983, 2018-05-01).

ID      CertID   DueDate
676790  48983   2018-05-03
678064  48983   2018-05-02
678086  48983   2018-05-01
678107  48983   2018-05-01
678061  48983   2018-05-01

I tried to get the list of duplicate entries but what I get is the entire table. This is what I used:

WITH A   -- Get a list of unique combinations of ResponseDueDate and CertificateID
AS  (
   SELECT Distinct
          ID,       
          ResponseDueDate,
          CertID
   FROM  FacCompliance
)
,   B  -- Get a list of all those CertID values that have more than one ResponseDueDate associated
AS  (
    SELECT CertID
    FROM   A
    GROUP BY
           CertID
    HAVING COUNT(*) > 1
)
SELECT  A.ID,
        A.ResponseDueDate,
        A.FacCertificateID
FROM    A
    JOIN B
        ON  A.CertID = B.CertID
order by CertID, ResponseDueDate;

What is wrong with the query I am using and is it possible to remove extra rows (in above example, keep one instance of (48983, 2018-05-01) combination and remove the rest. I am using SQL Server 2016.

Do you have a preference on which row to keep, eg smallest ID? — George Menoutis
– George Menoutis, Commented May 18, 2018 at 13:19
you can use ROW_NUMBER()OVER(Partition by CertId,DueDate ORDER BY ID, then delete the row number >1 — LONG
– LONG, Commented May 18, 2018 at 13:21

avb · Accepted Answer · 2018-05-18 13:30:56Z

5

use row number:

WITH A AS  (
   SELECT 
          ID,       
          ResponseDueDate,
          CertID,
          ROW_NUMBER() over (partition by CertID, ResponseDueDate order by ResponseDueDate) lp
   FROM  FacCompliance
)
delete a
where lp <> 1
;

also, if ID is unique you can do it without window functions:

delete fc
from  FacCompliance fc
where exists (
    select 1
    from FacCompliance ref
    where ref.ResponseDueDate = fc.ResponseDueDate
        and ref.CertID = fc.CertID
        and ref.ID < fc.ID
)

edited May 18, 2018 at 13:30

answered May 18, 2018 at 13:20

avb

1,8821 gold badge14 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

UnhandledExcepSean · Accepted Answer · 2018-05-18 13:21:30Z

1

You can order the data, partitioned by the CertID and DueDate, to eliminate the extra rows.

DECLARE @T TABLE (ID INT,CertID INT, DueDate DATE)
INSERT INTO @T(ID,CertID,DueDate) SELECT 676790,48983,'2018-05-03'
INSERT INTO @T(ID,CertID,DueDate) SELECT 678064,48983,'2018-05-02'
INSERT INTO @T(ID,CertID,DueDate) SELECT 678086,48983,'2018-05-01'
INSERT INTO @T(ID,CertID,DueDate) SELECT 678107,48983,'2018-05-01'
INSERT INTO @T(ID,CertID,DueDate) SELECT 678061,48983,'2018-05-01'


DELETE t
FROM @T t
INNER JOIN (
    SELECT
        *
        ,Row_number() OVER(PARTITION BY CertID,DueDate ORDER BY ID ASC) AS [Row]
    FROM @T
) Ordered ON Ordered.ID=t.ID
WHERE [Row]<>1

SELECT * FROM @T

answered May 18, 2018 at 13:21

UnhandledExcepSean

12.9k4 gold badges38 silver badges54 bronze badges

Collectives™ on Stack Overflow

SQL Query - delete rows with duplicate column value

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related