MySQL 8 choosing sub-optimal indexing on large table despite suitable indexing

Question

I'm working on a Laravel 12 project running MySQL 8.4. I have various models like Buyer, BuyerTier, Application and PingtreeGroup and I want to store raw transactional models in my PingtreeTransaction model.

This table will store around 500,000 entries a day, it's schema, minus indexing looks like:

Schema::create('pingtree_transactions', function (Blueprint $table) {
    $table->ulid('id');
    $table->foreignId('company_id');
    $table->foreignId('application_id');
    $table->foreignId('buyer_id');
    $table->foreignId('buyer_tier_id');
    $table->foreignId('pingtree_group_id');
    $table->foreignId('pingtree_id');
    $table->mediumInteger('processing_duration')->default(0);
    $table->smallInteger('request_response_code')->default(200);
    $table->decimal('commission', 8, 2)->default(0.00);
    $table->string('result', 32)->default('unknown');
    $table->string('request_url')->nullable();
    $table->integer('partition_id');
    $table->date('processed_on');
    $table->dateTime('processing_started_at');
    $table->dateTime('processing_ended_at')->nullable();
    $table->timestamps();

    $table->primary(['id', 'partition_id']);
});

The various query use cases are as follows for joining transactions to models are:

Fetch all pingtree transactions for any given application
Fetch all pingtree transactions for any given application between two dates
Fetch all pingtree transactions for any given buyer
Fetch all pingtree transactions for any given buyer between two dates
etc...

But then, there's a front-end page that's paginated and shows a date/time picker along with a tags component for each model allowing a user to filter all transactions, for example:

Show me all pingtree transactions for the past 3 days where the Buyer is either "foo" or "Bar", and where the BuyerTier is "a" and "b" where the result is either "accepted" or "declined" on any of them.

A user might not always include all fields in their search for models, they might only want to see everything over a period minus specific models.

For the end user, there's a lot of possible combinations for reporting via this front-end page since this is a business choice.

So in summary, there's two cases:

Individual model joining
A report page with various filters

Indexing dilemas...

Since I want to join individual models which won't require a date, like the foreignId columns, I would've thought adding the following indexes are suitable:

$table->index(['application_id']);
$table->index(['processed_on']);
$table->index(['company_id']);
$table->index(['application_id']);
$table->index(['buyer_id']);
$table->index(['buyer_tier_id']);
$table->index(['result']);
$table->index(['partition_id']);
$table->index(['processed_on']);
$table->index(['processing_started_at']);
$table->index(['processing_ended_at']);

On a table with millions of rows, adding new indexes is going to lock the table, but, the issue above, is now because I don't have a composite index, and the dates are ranges, the cardinality is really high on those columns, and lower on the buyer and buyer tier columns, so the database ends up weirdly just picking one index for processing_started_at which ends up taking minutes to load.

explain select
  *
from
  `pingtree_transactions`
where
  `company_id` in (2, 1)
  and `buyer_id` in ("154", "172")
  and `buyer_tier_id` in ("652")
  and `processing_started_at` >= '2025-05-21 23:00:00'
  and `processing_ended_at` <= '2025-05-23 22:59:59'
  and `result` in ("accepted")
order by
  `processing_started_at` desc
limit
  26 offset 0

If I then add some composite index with multiple columns in there like:

$table->index([
    'company_id',
    'buyer_tier_id',
    'buyer_id',
    'result',
    'processing_started_at',
    'processing_ended_at'
], 'composite_pingtree_transactions_all_index');

Then it only appears to use it if all of the columns are in the search query and is incredibly fast at around 5ms, but given the various combinations in filtering, this would then seemingly bloat the database with all the combinations, and if one field is missed out, it ends up falling back to a sub-optimal index.

Essentially, what combination of indexes then would best to always utilise indexing?

The reason for adding:

$table->primary(['id', 'partition_id']);

Is because I'm experimenting with partitioning, and partition_id would house the current day in YYYYMMDD format, so there would be a partition for each day, but when trying this, and adding partition id into the query it seems to use partition pruning but no indexing.

So the question here is, what indexing should I add for my use cases defined above.

UPDATE

As per O Jone's answer, I've done the exact queries suggested, and yet now, when filtering by something like this:

explain select
  *
from
  `pingtree_transactions`
where
  `partition_id` in (20250523, 20250524)
  and `company_id` in (2, 1)
  and `processing_started_at` >= '2025-05-22 22:00:00'
  and `processing_started_at` <= '2025-05-23 21:59:59'
  and `buyer_id` in ('3')
  and `buyer_tier_id` in ('1', '15')
  and `result` in ('accepted', 'declined')
order by
  `processing_started_at` desc
limit
  26 offset 0

It's now preferring the index pingtree_transactions_processing_started_at_index over the tier_result_start index despite having enough columns. Note that I've gone now from having a few select indexes, then tried 25 composite indexes each with their own different combinations and now back down to a select. There's always one scenario which is slow.

When using a filter on processing_started_at, you should provide both the lower and upper bounds (in other words, use a BETWEEN), so that MySQL fetches only that range from the index. In your example query, it would be BETWEEN '2025-05-21 23:00:00' AND '2025-05-23 22:59:59' (because you have an upper bound on processing_ended_at and a transaction must start before it ends). — Olivier
– Olivier, Commented May 23 at 8:32
I've just added the BETWEEN, doesn't seem to have any impact. Weirdly, for the report part, which includes a partition id, despite each partition having thousands of rows at random, it keeps fliiping, e,g: and partition_id` in (20250521)` might use the composite index, and partition_id` in (20250522)` might use the singular index which would be slower — Ryan H
– Ryan H, Commented May 23 at 8:42
Are you still using a partitioned table? You should test on a regular one. — Olivier
– Olivier, Commented May 23 at 8:43
It happens on a regular table too. I have another table: pingtree_buyer_transactions. On that table, a possible key of: composite_pingtree_transactions_all_index shows, but no index is chosen at all. — Ryan H
– Ryan H, Commented May 23 at 8:48
Then please provide the table structure, the query and the execution plan. — Olivier
– Olivier, Commented May 23 at 8:50

O. Jones · Accepted Answer · 2025-05-23 13:07:51Z

0

Indexes should be designed to match queries. Generally speaking, lots of single-column indexes, added "for good measure", are counterproductive.

The query you showed us has these filters.

 where ...
       company_id` in (2, 1)                            -- arrrgh!
  and `buyer_id` in ("154", "172")                      -- arrrgh!
  and `buyer_tier_id` in ("652")                        -- equality
  and `processing_started_at` >= '2025-05-21 23:00:00'  -- first range
  and `processing_ended_at` <= '2025-05-23 22:59:59'    -- second range
  and `result` in ("accepted")                          -- equality
order by
  `processing_started_at` desc                          -- same as first range

One thing to know is that IN(list, of, values) can be a performance antipattern.

Another thing to know is that multicolumn indexes should start with columns used in equality filters, and then a column that's used in a range filter. Only one column can be used for a range filter.

Try this index for the query you have.

$table->index([
    'buyer_tier_id',
    'result',
    'processing_started_at'
], 'tier_result_start');

This will let MySQL random-access it for the first row that matches buyer_tier_id and result with the first eligible processing_started_at value. Then MySQL scans the index sequentially, keeping the rows that match your other filter criteria and discarding the rest.

And, because you ORDER BY processing_started_at DESC the last column of the index, MySQL will scan it in reverse order and produce your result set already ordered. And will stop the scan when it gets the number of rows in your LIMIT. That's efficient.

A single-column index on processing_started_at is probably the most general index you could use if your queries will have all sorts of different filter criteria based on user input.

Read Markus Winand's https://use-the-index-luke.com/ for lots of wisdom on this complex topic of index design.

The reasoning behind index design gets more complex if you use partitioning.

answered May 23 at 13:07

O. Jones

110k17 gold badges134 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ryan H May 23 at 13:37

Thanks, I'll digest this soon. but what about the single column indexes for when i want to just simply get all transactions for a single application. Without a single index this is slow by about 50ms instead of 2ms. But when i add a single column index it ends up seemingly disregarding the multi-column one.

Ryan H May 23 at 14:31

The user might not always have buyer_tier_id in their query, so I ended up creating around 25 different composite indexes for all of the common querying and yet the database still doesn't always pick the right ones.

KIKO Software May 23 at 15:02

You can create composite indexes a bit more selective, if you know how they work. For instance, if you have a 3-column index like: (col1, col2, col3), MySQL can choose to use one of these indexes: (col1), (col1, col2) or (col1, col2, col3). This means you don't have to have separate indexes for (col1) and (col1, col2). See: How MySQL Uses Indexes

Ryan H May 23 at 15:45

I've gone from having a few select indexes, to 25 composite indexes for each combination, back to a few selective indexes @KIKOSoftware

ysth May 23 at 16:22

rather than just ditch indexing IN criteria, convert them to joins (at least where there are a limited number of values)

Collectives™ on Stack Overflow

MySQL 8 choosing sub-optimal indexing on large table despite suitable indexing

Indexing dilemas...

UPDATE

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Indexing dilemas...

UPDATE

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related