MySQL query optimization

Question

Could you please help me optimize this query. I've spent lots of time and still cannot rephrase it to be fast enough (say running in the matters of seconds, not minutes as it is now).

The query:

SELECT m.my_id, m.my_value, m.my_timestamp
  FROM (
    SELECT my_id, MAX(my_timestamp) AS most_recent_timestamp
      FROM my_table
      WHERE my_timestamp < '2011-03-01 08:00:00'
      GROUP BY my_id
  ) as tmp
LEFT OUTER JOIN my_table m
ON tmp.my_id = m.my_id AND tmp.most_recent_timestamp = m.my_timestamp
ORDER BY m.my_timestamp;

my_table is defined as follows:

CREATE TABLE my_table (
   my_id INTEGER NOT NULL,
   my_value VARCHAR(4000),
   my_timestamp TIMESTAMP default CURRENT_TIMESTAMP NOT NULL,
   INDEX MY_ID_IDX (my_id),
   INDEX MY_TIMESTAMP_IDX (my_timestamp),
   INDEX MY_ID_MY_TIMESTAMP_IDX (my_id, my_timestamp)
);

The goal of this query is to select the most recent my_value for each my_idbefore some timestamp. my_table contains ~100 million entries and it takes ~8 minutes to perform it.

explain:

+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
| id | select_type | table       | type  | possible_keys                                  | key                     | key_len | ref                       | rows  | Extra                                 |
+----+-------------+-------------+-------+------------------------------------------------+-------------------------+---------+---------------------------+-------+---------------------------------------+
|  1 | PRIMARY     | <derived2>  | ALL   | NULL                                           | NULL                    | NULL    | NULL                      | 90721 | Using temporary; Using filesort       |
|  1 | PRIMARY     | m          | ref   | MY_ID_IDX,MY_TIMESTAMP_IDX,MY_ID_TIMESTAMP_IDX | MY_TIMESTAMP_IDX        | 4       | tmp.most_recent_timestamp |    1  | Using where                           |
|  2 | DERIVED     | my_table    | range | MY_TIMESTAMP_IDX                               | MY_ID_MY_TIMESTAMP_IDX  | 8       | NULL                      | 61337 | Using where; Using index for group-by |
+----+-------------+-------------+-------+------------------------------------------------+-----------------------+---------+---------------------------+------+---------------------------------------+

Are you sure that's the query plan for the query you posted? The plan mentions table nv, but there's no such table in the query. The query may not even be correct, as the value for my_id in the sub-select may not be (indeed isn't likely to be) the id for the row where my_timestamp = MAX(my_timestamp). — outis
– outis, Commented Mar 2, 2011 at 14:26
Shouldn't your join condition be ...AND tmp.most_recent_timestamp = m.my_timestamp...? The inner query also looks to be missing a GROUP BY. — Joe Stefanelli
– Joe Stefanelli, Commented Mar 2, 2011 at 14:32
Can you explain what "SELECT my_id, MAX(my_timestamp) AS .." does and why no group by? — Zimbabao
– Zimbabao, Commented Mar 2, 2011 at 14:33
@outis, I'm sorry. I've modified original explain from the production DB and there might be inconsitencies. I've tried to correct them.| @Joe Stefanelli, yes it does. Seems that I've missed it while preparing and SCCE. @Zimbabao, mysql version 5.1. — Alex Nikolaenkov
– Alex Nikolaenkov, Commented Mar 2, 2011 at 14:47

sreimer · Accepted Answer · 2011-03-02 17:45:37Z

2

If I understand correctly, you should be able to drop the nested select completely, and move the where clause to the main query, order by my_timestamp descending and limit 1.

SELECT my_id, my_value, max(my_timestamp)
FROM my_table
WHERE my_timestamp < '2011-03-01 08:00:00'
GROUP BY my_id

*edit - added max and group by

edited Mar 2, 2011 at 17:45

answered Mar 2, 2011 at 14:32

sreimer

4,9982 gold badges36 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Ike Walker Over a year ago

Change the ORDER BY to DESC and this is perfect.

Alex Nikolaenkov Over a year ago

The only problem is that we need most recent entries for all the my_ids. I think that this query produces only one result.

Alex Nikolaenkov Over a year ago

@Ike, for each my_id i want to select the most recent value. So I need count(distinct my_id) results.

sreimer Over a year ago

try adding max(my_timestamp) in the select and group by my_id and remove the limit

Alex Nikolaenkov Over a year ago

@sreimer, how will this select my_value corresponding to the max(my_timestamp)?

|

bw_üezi · Accepted Answer · 2011-03-02 15:47:49Z

1

a trick to get a most recent record can be to use order by together with 'limit 1' instead of max aggregation together with "self" join

somthing like this (not tested):

SELECT m.my_id, m.my_value, m.my_timestamp
FROM my_table m
WHERE my_timestamp < '2011-03-01 08:00:00'
ORDER BY m.my_timestamp DESC
LIMIT 1
;

update above doesn't work because a grouping is required...
other solution that has WHERE-IN-SubSelect instead of the JOIN you've used.
could be faster. please test with your data.

SELECT m.my_id, m.my_value, m.my_timestamp
FROM my_table m
WHERE ( m.my_id, m.my_timestamp ) IN (
  SELECT i.my_id, MAX(i.my_timestamp)
  FROM my_table i
  WHERE i.my_timestamp < '2011-03-01 08:00:00'
  GROUP BY i.my_id
  )
ORDER BY m.my_timestamp;

edited Mar 2, 2011 at 15:47

answered Mar 2, 2011 at 14:32

bw_üezi

4,6044 gold badges26 silver badges43 bronze badges

2 Comments

Alex Nikolaenkov Over a year ago

we need all most recent pairs of (id, value) not only the most recent one.

Alex Nikolaenkov Over a year ago

The las one is not optimization but actually degrades performance. According to: mysqlperformanceblog.com/2010/10/25/…

Ike Walker · Accepted Answer · 2011-03-02 16:15:26Z

0

I notice in the explain plan that the optimizer is using the MY_ID_MY_TIMESTAMP_IDX index for the sub-query, but not the outer query.

You may be able to speed it up using an index hint. I also updated the ON clause to refer to tmp.most_recent_timestamp using its alias.

SELECT m.my_id, m.my_value, m.my_timestamp
  FROM (
    SELECT my_id, MAX(my_timestamp) AS most_recent_timestamp
      FROM my_table
      WHERE my_timestamp < '2011-03-01 08:00:00'
      GROUP BY my_id
  ) as tmp
LEFT OUTER JOIN my_table m use index (MY_ID_MY_TIMESTAMP_IDX)
ON tmp.my_id = m.my_id AND tmp.most_recent_timestamp = m.my_timestamp
ORDER BY m.my_timestamp;

edited Mar 2, 2011 at 16:15

answered Mar 2, 2011 at 14:36

Ike Walker

65.8k14 gold badges115 silver badges112 bronze badges

6 Comments

Alex Nikolaenkov Over a year ago

@Ike, I've corrected the query. Missed the group by statement while preparing an SCCE. The problem is that I have to fetch the "most recent timestamps" for every my_id.

Alex Nikolaenkov Over a year ago

@Ike, unfortunately I've tried that myself but that doesn't change optimizers behaviour. As far as I understood it's a MySQL feature (mysqlperformanceblog.com/2006/08/31/…). And at this point I think that it's still possible to tune the query without creating temporary tables or views.

Ike Walker Over a year ago

@Alex, If USE INDEX doesn't help, try FORCE INDEX instead. You should be able to force it to use MY_ID_MY_TIMESTAMP_IDX for table m, which could speed up your query a lot.

Alex Nikolaenkov Over a year ago

@Ike, unfortunately FORCE INDEX wont change anything. As far as I understand MySQL creates temporary table and joins using it. And that table has no indices. I've tried a lot of things and it seems that the problem should be apporached from the another perspective because even if I manage to speed up this query on a table with 40 million rows it will still took ages on the table with 400 mil rows (just tested on such table). I'm accepting your answer not because it solved my problem but because it helped me the most. Thank you for your time.

Ike Walker Over a year ago

@Alex, I based my answer on the explain plan you posted, which I believe can be improved upon using an index hint. Even though the left side of your join is a temporary table, the right side is a base table and that's the part you may be able to optimize. Using the index hint should improve the way the right side of the join (table m) is accessed. Did you try it?

|

Collectives™ on Stack Overflow

MySQL query optimization

3 Answers 3

6 Comments

2 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related