2

I have a few million database rows in MySQL InnoDB database which need to get cleaned up. Look at this example.

SELECT archiveid, clearingid, pickdate 
  FROM tblclearingarchive 
  WHERE clearingid = 30978729 
  ORDER BY pickdate;


+-----------+------------+---------------------+
| archiveid | clearingid | pickdate            |
+-----------+------------+---------------------+
|  34328367 |   30978729 | NULL                | *
|  34333844 |   30978729 | 2015-10-27 15:55:30 | <- keep only this row with oldest date
|  34438038 |   30978729 | 2016-03-01 10:34:25 | *
|  34481472 |   30978729 | 2016-04-20 13:44:19 | *
+-----------+------------+---------------------+
4 rows in set (0.01 sec)

So I know the clearingid value(s) of the affected field(s) and want to remove the one with no pickdate (null) and the two lines which are redundant (later, after first pick). In the example above, the ones marked with * should get deleted.

Any hints about how such SQL update/delete might look like?

There are about 30M rows and about 250K rows (known clearingid's) to clean up.

With the initial idea of Matias Barrios I found this solution to verify. It perfectly lists the rows I want to delete:

SELECT archiveid, clearingid, pickdate 
  FROM tblclearingarchive 
  WHERE clearingid = 30978729 
  AND (pickdate NOT IN (SELECT MIN(pickdate)
                        FROM tblclearingarchive 
                        WHERE clearingid = 30978729 ) 
       OR pickdate is NULL)
  ORDER BY pickdate;

+-----------+------------+---------------------+
| archiveid | clearingid | pickdate            |
+-----------+------------+---------------------+
|  34328367 |   30978729 | NULL                |
|  34438038 |   30978729 | 2016-03-01 10:34:25 |
|  34481472 |   30978729 | 2016-04-20 13:44:19 |
+-----------+------------+---------------------+
3 rows in set (0.20 sec)

But I fail to delete using this sort of query:

DELETE FROM tblclearingarchive 
  WHERE clearingid = 30978729 
  AND (pickdate NOT IN (SELECT MIN(pickdate)
                        FROM tblclearingarchive 
                        WHERE clearingid = 30978729 ) 
      OR pickdate is NULL);

ERROR 1093 (HY000): You can't specify target table 'tblclearingarchive' for update in FROM clause

7
  • Note that it's often quicker to create a new table, retaining only those rows you want to keep, and then replace the old table wth the new one. Commented Feb 27, 2020 at 16:47
  • For further help see: Why should I provide an MCRE for what seems to me to be a very simple SQL query? and note that, while 'minimal', a data set comprising just four rows is unlikely to be 'complete' Commented Feb 27, 2020 at 17:07
  • Hi @Strawberry, if you want i will delete my answer. Did not saw you comment made 27 min ago. Cheers! Commented Feb 27, 2020 at 17:15
  • @VBoka, why do you want to delete? I found your "help table" answer was helpful! Commented Feb 27, 2020 at 17:22
  • At first I thought it was the exactly the same thing other person suggested here. But I see it is not. Sorry. hehehe two mistakes... long day. Hope it will work for you. I have tried all other query's provided here and they all have the same error you described. Commented Feb 27, 2020 at 17:24

3 Answers 3

1

You will need to do this:

Create "assistance table" so you can do the delete

create table test as 
select * from tblclearingarchive

Then do the delete:

delete from tblclearingarchive t1
where t1.pickdate <> (select min(t.pickdate) 
                     from test t
                     group by clearingid)
or t1.pickdate is null;

Here is a small demo

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. The subquery also needs to respect the clearingid, but this is a good idea. Will have to check tomorrow.
0

If you want the first, then you can filter:

select t.*
from t
where t.pickdate = (select min(t2.pickdate) 
                    from t t2
                    where t2.clearingid = t.clearingid
                   );

Or, if you just want one row:

select t.*
from t
where t.pickdate is not null
order by t.pickdate
limit 1;

EDIT:

If you want to actually modify the table:

delete t
    from t join
         (select clearingid, min(pickdate) as min_pickdate
          from t
          group by clearingid
         ) c
         on t.clearingid = c.clearingid
    where t.pickdate > c.min_pickdate or t.pickdate is null;

2 Comments

Thanks Gordon, but I need one DELETE statement to kill all the fields which are not the oldest pickdate. Or alternatively one select that catches all the ones to delete.
@VolkerSchmid . . . The edited delete should do what you want.
0

This should do as you intend.

DELETE FROM tblclearingarchive
  WHERE archiveid IN (
    SELECT * FROM ( SELECT archiveid, clearingid, pickdate 
      FROM tblclearingarchive 
      WHERE clearingid = 30978729 
      AND pickdate NOT IN ( SELECT max(pickdate)
                        FROM tblclearingarchive LIMIT 1)
      ORDER BY pickdate) tmpTable
   ) OR  pickdate IS NULL

Let me know if it works.

1 Comment

This brought me closer. I need the NULL date and also MIN() instead of MAX(). But a good start. I just updated the initial question now. I found a SELECT, but it does not work as a DELETE. :-(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.