1

I am doing some data clean up and I would like to remove duplicate rows by finding records that have the same "picture id" and "date" values:

Example:

picture_id - 2 date - "13-Jul-18"
picture_id - 2 date - "13-Jul-18"
picture_id - 2 date - "13-Jul-18"
picture_id - 2 date - "13-Jul-18"

DELETE FROM `pictures` WHERE `picture_id` = '2' AND `date` = '13-Jul-18'

Table columns (in order): ID (primary key), picture_id, date, followers

I would like to only delete all but one of the duplicate records. It does not matter which one. How can I accomplish this?

4
  • yea do you need to know them? Commented Jul 13, 2018 at 15:41
  • Can you post what the full table looks like (IE all the columns) Commented Jul 13, 2018 at 15:46
  • 4
    Let this be a lesson to always put PRIMARY KEY on table. Commented Jul 13, 2018 at 15:46
  • I added the table columns in OP Commented Jul 13, 2018 at 15:48

4 Answers 4

2

In MySQL, you can keep the smallest (or biggest) id using JOIN:

DELETE p
    FROM pictures p JOIN
         (SELECT p.picture_id, p.date, MIN(id) as min_id
          FROM pictures p
          WHERE p.picture_id = 2 AND p.date = '2018-07-13' 
          GROUP BY p.picture_id
         ) pp
         ON p.picture_id = pp.picture_id AND p.date = pp.date AND p.id > p.min_id;
Sign up to request clarification or add additional context in comments.

Comments

1

Assuming you don't care which ID you keep you can select one record all delete all those records which are not the one selected

DELETE
FROM     pictures
WHERE    ID NOT IN (
                     SELECT 
                              ID
                     FROM     pictures
                     WHERE    picture_id = 2 AND
                              Date = '2018-07-13'
                     LIMIT 1
                    ) AND
         picture_id = 2 AND
         Date = '2018-07-13'

The fact these are unwanted duplicates makes me think either your current Primary Key is insufficient for your purposes or you need to look at a unique constraints

2 Comments

Oops. Edited now
Site note: in this case it really does not matter But.. SQL is a unordered set.. Meaning LIMIT 1 without ORDER BY will not guarantee the "first" (physical table) record that will be returned.. Also SQL will not guarantee the same record is returned always even when running the query SELECT ID FROM pictures WHERE picture_id = 2 AND Date = '2018-07-13' LIMIT 1 twice without the use of ORDER BY in combination with LIMIT
0

you can try something like

DROP TABLE IF EXISTS  pictures;
CREATE TABLE pictures(picture_id INT(11), `dt` DATE, followers INT(11));
INSERT INTO pictures VALUES
(2,'2018-07-13',4553),
(2,'2018-07-13',4552),
(2,'2018-07-13',4557),
(2,'2018-07-13',4577),

(3,'2018-07-13',4355),
(3,'2018-07-13',4351),
(3,'2018-07-13',4353),
(3,'2018-07-13',4374);

Delete query

DELETE P FROM pictures p 
    LEFT JOIN (
        SELECT picture_id, dt, MAX(followers) AS fol 
        FROM pictures WHERE dt ='2018-07-13' GROUP BY picture_id
    ) AS main
ON main.dt = p.dt
WHERE main.picture_id = p.picture_id
AND main.fol  <>  p.followers;

I hope this will solve you problem.

2 Comments

thanks for your reply. but I get "Unknown table 'P' in MULTI DELETE" when executing
You need to replace your table name instead of pictures. Try this demo rextester.com/RWS32654
0

simply use common table

     With CTE_Duplicates as
   (select picture_id ,date , row_number() over(partition by picture_id,date order by picture_id ,date ) rownumber 
   from `pictures` )
   delete from CTE_Duplicates where rownumber!=1

it work for me.please check

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.