MySQL Date Range Query Optimization

Question

I have a MySQL table structured like this:

CREATE TABLE `messages` (
  `id` int NOT NULL AUTO_INCREMENT,
  `author` varchar(250) COLLATE utf8mb4_unicode_ci NOT NULL,
  `message` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,
  `serverid` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `guildname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=27769461 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

I need to query this table for various statistics using date ranges for Grafana graphs, however all of those queries are extremely slow, despite the table being indexed using a composite key of id and date. "id" is auto-incrementing and date is also always increasing.

The queries generated by Grafana look like this:

SELECT
  UNIX_TIMESTAMP(date) DIV 120 * 120 AS "time",
  count(DISTINCT(serverid)) AS "servercount"
FROM messages
WHERE
  date BETWEEN FROM_UNIXTIME(1615930154) AND FROM_UNIXTIME(1616016554)
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(date) DIV 120 * 120

This query takes over 30 seconds to complete with 27 million records in the table. Explaining the query results in this output:

+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra                       |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
|  1 | SIMPLE      | messages | NULL       | ALL  | PRIMARY       | NULL | NULL    | NULL | 26952821 |    11.11 | Using where; Using filesort |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+

This indicates that MySQL is indeed using the composite primary key I created for indexing the data, but still has to scan almost the entire table, which I do not understand. How can I optimize this table for date range queries?

Is date the first column in the index? (would have been better if you just showed DDL of the table and the index...) — sticky bit
– sticky bit, Commented Mar 17, 2021 at 22:10
Ah OK, thanks. So it is not. Try to either change the index so that date is the first column in it or create a separate index on date. — sticky bit
– sticky bit, Commented Mar 17, 2021 at 22:13
Okay, adding a new index using ALTER TABLE messages ADD INDEX date_id_index(date, id); dropped the query time down to 0.45 seconds. Thank you, you're a lifesaver. Please add an answer I can accept. — Private_GER
– Private_GER, Commented Mar 17, 2021 at 22:18

Rick James · Accepted Answer · 2021-03-18 18:07:14Z

1

Plan A:

PRIMARY KEY(date, id),  -- to cluster by date
INDEX(id) -- needed to keep AUTO_INCREMENT happy

Assiming the table is quite big, having date at the beginning of the PK puts the rows in the given date range all next to each other. This minimizes (somewhat) the I/O.

Plan B:

PRIMARY KEY(id),
INDEX(date, serverid)

Now the secondary index is exactly what is needed for the one query you have provided. It is optimized for searching by date, and it is smaller than the whole table, hence even faster (I/O-wise) than Plan A.

But, if you have a lot of different queries like this, adding a lot more indexes gets impractical.

Plan C: There may be a still better way:

PRIMARY KEY(id),
INDEX(server_id, date)

In theory, it can hop through that secondary index checking each server_id. But I am not sure that such an optimization exists.

Plan D: Do you need id for anything other than providing a unique PRIMARY KEY? If not, there may be other options.

answered Mar 18, 2021 at 18:07

Rick James

144k15 gold badges144 silver badges254 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Private_GER Over a year ago

id also serves as a counter of total records. Other than that, it's not used at all.

sticky bit · Accepted Answer · 2021-03-17 22:22:36Z

1

The index on (id, date) doesn't help because the first key is id not date.

You can either
(a) drop the current index and index (date, id) instead -- when date is in the first place this can be used to filter for date regardless of the following columns -- or
(b) just create an additional index only on (date) to support the query.

answered Mar 17, 2021 at 22:22

sticky bit

37.7k12 gold badges34 silver badges46 bronze badges

1 Comment

Rick James Over a year ago

That won't work, as stated, on the PRIMARY KEY.

Collectives™ on Stack Overflow

MySQL Date Range Query Optimization

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related