1

I have this table schema:

CREATE TABLE `exchange` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `exchange` double unsigned NOT NULL,
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `platform_id` bigint unsigned NOT NULL,
  `product_id` bigint unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `exchange_created_at_index` (`created_at`),
  KEY `exchange_product_id_created_at_id_index` (`product_id`,`created_at`,`id`),
  KEY `exchange_product_id_created_at_platform_id_id_index` (`product_id`,`created_at`,`platform_id`,`id`),
  KEY `exchange_platform_fk` (`platform_id`),
  KEY `exchange_platform_id_created_at_id_index` (`platform_id`,`created_at`,`id`),
  CONSTRAINT `exchange_platform_fk` FOREIGN KEY (`platform_id`) REFERENCES `platform` (`id`) ON DELETE CASCADE,
  CONSTRAINT `exchange_product_fk` FOREIGN KEY (`product_id`) REFERENCES `product` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

This table has about 14.761.479 rows

I'm trying to optimize this query:

SELECT *
FROM `exchange`
WHERE (
    `created_at` >= '2021-09-17 22:36:11'
    AND `platform_id` = 1
    AND `id` IN (
        SELECT MIN(`id`)
        FROM `exchange`
        WHERE `created_at` >= '2021-09-17 22:36:11'
        GROUP BY `product_id`
    )
);

425 rows in set (14,69 sec)

Subquery only is about 2 seconds:

SELECT MIN(`id`)
FROM `exchange`
WHERE `created_at` >= '2021-09-17 22:36:11'
GROUP BY `product_id`;

729 rows in set (2,11 sec)

Explain is:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: exchange
   partitions: NULL
         type: ref
possible_keys: exchange_created_at_index,exchange_platform_fk,exchange_platform_id_created_at_id_index
          key: exchange_platform_fk
      key_len: 8
          ref: const
         rows: 6955794
     filtered: 50.00
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: exchange
   partitions: NULL
         type: index
possible_keys: exchange_created_at_index,exchange_product_id_created_at_id_index,exchange_product_id_created_at_platform_id_id_index
          key: exchange_product_id_created_at_id_index
      key_len: 21
          ref: NULL
         rows: 13911589
     filtered: 50.00
        Extra: Using where; Using index
2 rows in set, 1 warning (0,00 sec)

And Warning:

  Level: Note
   Code: 1003
Message: /* select#1 */ select `crypto`.`exchange`.`id` AS `id`,`crypto`.`exchange`.`exchange` AS `exchange`,`crypto`.`exchange`.`created_at` AS `created_at`,`crypto`.`exchange`.`platform_id` AS `platform_id`,`crypto`.`exchange`.`product_id` AS `product_id` from `crypto`.`exchange` where ((`crypto`.`exchange`.`platform_id` = 1) and (`crypto`.`exchange`.`created_at` >= TIMESTAMP'2021-09-17 22:36:11') and <in_optimizer>(`crypto`.`exchange`.`id`,`crypto`.`exchange`.`id` in ( <materialize> (/* select#2 */ select min(`crypto`.`exchange`.`id`) from `crypto`.`exchange` where (`crypto`.`exchange`.`created_at` >= TIMESTAMP'2021-09-17 22:36:11') group by `crypto`.`exchange`.`product_id` having true ), <primary_index_lookup>(`crypto`.`exchange`.`id` in <temporary table> on <auto_distinct_key> where ((`crypto`.`exchange`.`id` = `<materialized_subquery>`.`MIN(``id``)`))))))
1 row in set (0,00 sec)

Why EXPLAIN key on PRIMARY query is only exchange_platform_fk and not exchange_platform_id_created_at_id_index?

Which indexes I need to add to optimize this query?

Move WHERE conditions to subquery is worst:

SELECT *
FROM `exchange`
WHERE (
    `id` IN (
        SELECT MIN(`id`)
        FROM `exchange`
        WHERE (
            `created_at` >= '2021-09-17 22:36:11'
            AND `platform_id` = 1
        )
        GROUP BY `product_id`
    )
);

425 rows in set (19,86 sec)
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: exchange
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 13911589
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: exchange
   partitions: NULL
         type: ref
possible_keys: exchange_created_at_index,exchange_product_id_created_at_id_index,exchange_product_id_created_at_platform_id_id_index,exchange_platform_fk,exchange_platform_id_created_at_id_index,exchange_platform_id_created_at_index
          key: exchange_platform_fk
      key_len: 8
          ref: const
         rows: 6955794
     filtered: 50.00
        Extra: Using where; Using temporary
2 rows in set, 1 warning (0,00 sec)

Thanks!

1
  • 1
    It's rare to find query-optimization question that provides the right information. Yours does. Thanks. Commented Sep 23, 2021 at 10:09

2 Answers 2

1

Your subquery does the heavy lifting. It is

        SELECT MIN(id)
        FROM exchange
        WHERE created_at >= '2021-09-17 22:36:11'
        GROUP BY product_id

The index you need to cover this subquery is on (created_at, product_id, id ASC). Why?

  1. You want to get the entire result just from the index without having to look up the data in the table.
  2. You can think of these indexes (which use the B-Tree layout) as being sorted in order.
  3. This subquery random-accesses the index to the first eligible row, based on your created_at WHERE condition.
  4. It then reads the index sequentially. It's in a good order to group by product_id.
  5. Because you're looking for MIN(id), it can do a very fast loose index scan.

When you construct an index to match a query, you put the equality-matched columns first, then a range-mapped column. So, for

SELECT a,b,c, FROM tbl WHERE a=1 AND b=2 AND c> 10

You want an index on (a,b,c).

By the way, if you have an index on (a,b,c) an index on just (a) or on (a,b) is unnecessary. It serves no purpose except to slow down inserts and updates.

Note: in InnoDB, the primary key id is already part of the index. So you may be able to omit it as the last column in the index I suggested. Give it a try.

Read this: https://use-the-index-luke.com/ And welcome to the arcane world of index wrangling.

Sign up to request clarification or add additional context in comments.

1 Comment

I have added subquery execution details as PRIMARY. It's only 2 seconds of 14. Thanks!
0

Give this reformulation a try:

SELECT  *
    FROM  
    (
        SELECT  e1.product_id, MIN(e1.id) AS min_id
            FROM  `exchange` AS e1
            WHERE  e1.`platform_id` = 1
              AND  e1.`created_at` >= '2021-09-17 22:36:11'
            GROUP BY  e1.product_id 
    ) AS x
    JOIN  `exchange` AS e  ON e.id = x.min_id ;

Together with this index for exchange:

INDEX(platform_id, created_at, product_id, id)

It is aimed at the subquery.

  • platform_id is tested with =, so it needs to come first.
  • created_at takes care of the rest of the WHERE.
  • The other two columns are last; they make the index 'cover' the query.

Note the philosophy I used:

  1. Find the ids that were needed. This list will probably be smaller than all the rows of exchange.
  2. Look up * by efficiently using PRIMARY KEY(id).

(IN (SELECT...) is sometimes very poorly optimized.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.