0

I have a MySQL table structured like this:

CREATE TABLE `messages` (
  `id` int NOT NULL AUTO_INCREMENT,
  `author` varchar(250) COLLATE utf8mb4_unicode_ci NOT NULL,
  `message` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,
  `serverid` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `guildname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=27769461 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

I need to query this table for various statistics using date ranges for Grafana graphs, however all of those queries are extremely slow, despite the table being indexed using a composite key of id and date. "id" is auto-incrementing and date is also always increasing.

The queries generated by Grafana look like this:

SELECT
  UNIX_TIMESTAMP(date) DIV 120 * 120 AS "time",
  count(DISTINCT(serverid)) AS "servercount"
FROM messages
WHERE
  date BETWEEN FROM_UNIXTIME(1615930154) AND FROM_UNIXTIME(1616016554)
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(date) DIV 120 * 120

This query takes over 30 seconds to complete with 27 million records in the table. Explaining the query results in this output:

+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra                       |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
|  1 | SIMPLE      | messages | NULL       | ALL  | PRIMARY       | NULL | NULL    | NULL | 26952821 |    11.11 | Using where; Using filesort |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+

This indicates that MySQL is indeed using the composite primary key I created for indexing the data, but still has to scan almost the entire table, which I do not understand. How can I optimize this table for date range queries?

5
  • 1
    Is date the first column in the index? (would have been better if you just showed DDL of the table and the index...) Commented Mar 17, 2021 at 22:10
  • I edited the DDL into the question. Commented Mar 17, 2021 at 22:12
  • The primary key is currently (id, date). Commented Mar 17, 2021 at 22:13
  • 2
    Ah OK, thanks. So it is not. Try to either change the index so that date is the first column in it or create a separate index on date. Commented Mar 17, 2021 at 22:13
  • Okay, adding a new index using ALTER TABLE messages ADD INDEX date_id_index(date, id); dropped the query time down to 0.45 seconds. Thank you, you're a lifesaver. Please add an answer I can accept. Commented Mar 17, 2021 at 22:18

2 Answers 2

1

Plan A:

PRIMARY KEY(date, id),  -- to cluster by date
INDEX(id) -- needed to keep AUTO_INCREMENT happy

Assiming the table is quite big, having date at the beginning of the PK puts the rows in the given date range all next to each other. This minimizes (somewhat) the I/O.

Plan B:

PRIMARY KEY(id),
INDEX(date, serverid)

Now the secondary index is exactly what is needed for the one query you have provided. It is optimized for searching by date, and it is smaller than the whole table, hence even faster (I/O-wise) than Plan A.

But, if you have a lot of different queries like this, adding a lot more indexes gets impractical.

Plan C: There may be a still better way:

PRIMARY KEY(id),
INDEX(server_id, date)

In theory, it can hop through that secondary index checking each server_id. But I am not sure that such an optimization exists.

Plan D: Do you need id for anything other than providing a unique PRIMARY KEY? If not, there may be other options.

Sign up to request clarification or add additional context in comments.

1 Comment

id also serves as a counter of total records. Other than that, it's not used at all.
1

The index on (id, date) doesn't help because the first key is id not date.

You can either
(a) drop the current index and index (date, id) instead -- when date is in the first place this can be used to filter for date regardless of the following columns -- or
(b) just create an additional index only on (date) to support the query.

1 Comment

That won't work, as stated, on the PRIMARY KEY.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.