Understanding EXPLAIN SELECT to optimize MySQL query

Question

I have this table schema:

CREATE TABLE `exchange` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `exchange` double unsigned NOT NULL,
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `platform_id` bigint unsigned NOT NULL,
  `product_id` bigint unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `exchange_created_at_index` (`created_at`),
  KEY `exchange_product_id_created_at_id_index` (`product_id`,`created_at`,`id`),
  KEY `exchange_product_id_created_at_platform_id_id_index` (`product_id`,`created_at`,`platform_id`,`id`),
  KEY `exchange_platform_fk` (`platform_id`),
  KEY `exchange_platform_id_created_at_id_index` (`platform_id`,`created_at`,`id`),
  CONSTRAINT `exchange_platform_fk` FOREIGN KEY (`platform_id`) REFERENCES `platform` (`id`) ON DELETE CASCADE,
  CONSTRAINT `exchange_product_fk` FOREIGN KEY (`product_id`) REFERENCES `product` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

This table has about 14.761.479 rows

I'm trying to optimize this query:

SELECT *
FROM `exchange`
WHERE (
    `created_at` >= '2021-09-17 22:36:11'
    AND `platform_id` = 1
    AND `id` IN (
        SELECT MIN(`id`)
        FROM `exchange`
        WHERE `created_at` >= '2021-09-17 22:36:11'
        GROUP BY `product_id`
    )
);

425 rows in set (14,69 sec)

Subquery only is about 2 seconds:

SELECT MIN(`id`)
FROM `exchange`
WHERE `created_at` >= '2021-09-17 22:36:11'
GROUP BY `product_id`;

729 rows in set (2,11 sec)

Explain is:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: exchange
   partitions: NULL
         type: ref
possible_keys: exchange_created_at_index,exchange_platform_fk,exchange_platform_id_created_at_id_index
          key: exchange_platform_fk
      key_len: 8
          ref: const
         rows: 6955794
     filtered: 50.00
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: exchange
   partitions: NULL
         type: index
possible_keys: exchange_created_at_index,exchange_product_id_created_at_id_index,exchange_product_id_created_at_platform_id_id_index
          key: exchange_product_id_created_at_id_index
      key_len: 21
          ref: NULL
         rows: 13911589
     filtered: 50.00
        Extra: Using where; Using index
2 rows in set, 1 warning (0,00 sec)

And Warning:

  Level: Note
   Code: 1003
Message: /* select#1 */ select `crypto`.`exchange`.`id` AS `id`,`crypto`.`exchange`.`exchange` AS `exchange`,`crypto`.`exchange`.`created_at` AS `created_at`,`crypto`.`exchange`.`platform_id` AS `platform_id`,`crypto`.`exchange`.`product_id` AS `product_id` from `crypto`.`exchange` where ((`crypto`.`exchange`.`platform_id` = 1) and (`crypto`.`exchange`.`created_at` >= TIMESTAMP'2021-09-17 22:36:11') and <in_optimizer>(`crypto`.`exchange`.`id`,`crypto`.`exchange`.`id` in ( <materialize> (/* select#2 */ select min(`crypto`.`exchange`.`id`) from `crypto`.`exchange` where (`crypto`.`exchange`.`created_at` >= TIMESTAMP'2021-09-17 22:36:11') group by `crypto`.`exchange`.`product_id` having true ), <primary_index_lookup>(`crypto`.`exchange`.`id` in <temporary table> on <auto_distinct_key> where ((`crypto`.`exchange`.`id` = `<materialized_subquery>`.`MIN(``id``)`))))))
1 row in set (0,00 sec)

Why EXPLAIN key on PRIMARY query is only exchange_platform_fk and not exchange_platform_id_created_at_id_index?

Which indexes I need to add to optimize this query?

Move WHERE conditions to subquery is worst:

SELECT *
FROM `exchange`
WHERE (
    `id` IN (
        SELECT MIN(`id`)
        FROM `exchange`
        WHERE (
            `created_at` >= '2021-09-17 22:36:11'
            AND `platform_id` = 1
        )
        GROUP BY `product_id`
    )
);

425 rows in set (19,86 sec)

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: exchange
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 13911589
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: exchange
   partitions: NULL
         type: ref
possible_keys: exchange_created_at_index,exchange_product_id_created_at_id_index,exchange_product_id_created_at_platform_id_id_index,exchange_platform_fk,exchange_platform_id_created_at_id_index,exchange_platform_id_created_at_index
          key: exchange_platform_fk
      key_len: 8
          ref: const
         rows: 6955794
     filtered: 50.00
        Extra: Using where; Using temporary
2 rows in set, 1 warning (0,00 sec)

Thanks!

It's rare to find query-optimization question that provides the right information. Yours does. Thanks. — O. Jones
– O. Jones, Commented Sep 23, 2021 at 10:09

O. Jones · Accepted Answer · 2021-09-23 10:07:45Z

1

Your subquery does the heavy lifting. It is

        SELECT MIN(id)
        FROM exchange
        WHERE created_at >= '2021-09-17 22:36:11'
        GROUP BY product_id

The index you need to cover this subquery is on (created_at, product_id, id ASC). Why?

You want to get the entire result just from the index without having to look up the data in the table.
You can think of these indexes (which use the B-Tree layout) as being sorted in order.
This subquery random-accesses the index to the first eligible row, based on your created_at WHERE condition.
It then reads the index sequentially. It's in a good order to group by product_id.
Because you're looking for MIN(id), it can do a very fast loose index scan.

When you construct an index to match a query, you put the equality-matched columns first, then a range-mapped column. So, for

SELECT a,b,c, FROM tbl WHERE a=1 AND b=2 AND c> 10

You want an index on (a,b,c).

By the way, if you have an index on (a,b,c) an index on just (a) or on (a,b) is unnecessary. It serves no purpose except to slow down inserts and updates.

Note: in InnoDB, the primary key id is already part of the index. So you may be able to omit it as the last column in the index I suggested. Give it a try.

Read this: https://use-the-index-luke.com/ And welcome to the arcane world of index wrangling.

edited Sep 23, 2021 at 10:07

answered Sep 23, 2021 at 10:01

O. Jones

110k17 gold badges134 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lito Over a year ago

I have added subquery execution details as PRIMARY. It's only 2 seconds of 14. Thanks!

Rick James · Accepted Answer · 2021-09-24 00:20:47Z

Give this reformulation a try:

SELECT  *
    FROM  
    (
        SELECT  e1.product_id, MIN(e1.id) AS min_id
            FROM  `exchange` AS e1
            WHERE  e1.`platform_id` = 1
              AND  e1.`created_at` >= '2021-09-17 22:36:11'
            GROUP BY  e1.product_id 
    ) AS x
    JOIN  `exchange` AS e  ON e.id = x.min_id ;

Together with this index for exchange:

INDEX(platform_id, created_at, product_id, id)

It is aimed at the subquery.

platform_id is tested with =, so it needs to come first.
created_at takes care of the rest of the WHERE.
The other two columns are last; they make the index 'cover' the query.

Note the philosophy I used:

Find the ids that were needed. This list will probably be smaller than all the rows of exchange.
Look up * by efficiently using PRIMARY KEY(id).

(IN (SELECT...) is sometimes very poorly optimized.)

Collectives™ on Stack Overflow

Understanding EXPLAIN SELECT to optimize MySQL query

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related