1

I am not sure on how to make a decent index that will capture category/log_code properly. Maybe I also need to change my query? Appreciate any input!

All SELECTS contain:

SELECT logentry_id, date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code 
ORDER BY logentry_id DESC

Query can be as above, but usually has a WHERE to specify the category of log_codes to show, and/or partner, and/or customer. Examples of WHERE:

WHERE partner_id = 1

WHERE log_codes.category_overview = 1

WHERE partner_id = 1 AND log_codes.category_overview = 1

WHERE partner_id = 1 AND customer_id = 1 AND log_codes.category_overview = 1

Database structure:

CREATE TABLE IF NOT EXISTS `log_codes` (
  `log_code` smallint(6) NOT NULL,
  `log_desc` varchar(255),
  `category_mail` tinyint(1) NOT NULL,
  `category_overview` tinyint(1) NOT NULL,
  `category_cron` tinyint(1) NOT NULL,
  `category_documents` tinyint(1) NOT NULL,
  `category_error` tinyint(1) NOT NULL,
  PRIMARY KEY (`log_code`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `log_entries` (
  `logentry_id` int(11) NOT NULL AUTO_INCREMENT,
  `date` datetime NOT NULL,
  `log_code` smallint(6) NOT NULL,
  `partner_id` int(11) NOT NULL,
  `customer_id` int(11) NOT NULL,
  PRIMARY KEY (`logentry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;

EDIT: Added indexes on fields, here is output of SHOW INDEXES:

+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name              | Seq_in_index | Column_name           | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_codes |          0 | PRIMARY               |            1 | log_code              | A         |          97 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_mail         |            1 | category_mail         | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_overview     |            1 | category_overview     | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_cron         |            1 | category_cron         | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_documents    |            1 | category_documents    | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| log_codes |          1 | category_error        |            1 | category_error        | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name     | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_entries |          0 | PRIMARY      |            1 | logentry_id  | A         |      163020 |     NULL | NULL   |      | BTREE      |         |               |
| log_entries |          1 | log_code     |            1 | log_code     | A         |          90 |     NULL | NULL   |      | BTREE      |         |               |
| log_entries |          1 | partner_id   |            1 | partner_id   | A         |           6 |     NULL | NULL   | YES  | BTREE      |         |               |
| log_entries |          1 | customer_id  |            1 | customer_id  | A         |       20377 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

EDIT 2: Added composite indexes: (log_code, category_overview) and (log_code, category_overview) on log_codes. (customer_id, partner_id) on log_entries.

Here are some EXPLAIN output (query returns 66818 rows):

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table       | type   | possible_keys                       | key        | key_len | ref                  | rows   | Extra                       |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
|  1 | SIMPLE      | log_entries | ref    | log_code,partner_id                 | partner_id | 2       | const                | 156110 | Using where; Using filesort |
|  1 | SIMPLE      | log_codes   | eq_ref | PRIMARY,code_overview,overview_code | PRIMARY    | 2       | log_entries.log_code |      1 | Using where                 |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+

But I also have some LEFT JOINs that I did not think would affect the index design, but they cause a "Using temporary" problem. Here is EXPLAIN output (query returns 66818 rows):

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries 
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code 
LEFT JOIN partners ON log_entries.partner_id = partners.partner_id 
LEFT JOIN joined_table1 ON log_entries.t1_id = joined_table1.t1_id 
LEFT JOIN joined_table2 ON log_entries.t2_id = joined_table2.t2_id 
LEFT JOIN joined_table3 ON log_entries.t3_id = joined_table3.t3_id 
LEFT JOIN joined_table4 ON joined_table3.t4_id = joined_table4.t4_id 
LEFT JOIN joined_table5 ON log_entries.t5_id = joined_table5.t5_id 
LEFT JOIN joined_table6 ON log_entries.t6_id = joined_table6.t6_id
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC;
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table         | type   | possible_keys                       | key           | key_len | ref                      | rows | Extra                                        |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | log_codes     | ref    | PRIMARY,code_overview,overview_code | overview_code | 1       | const                    |   54 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | log_entries   | ref    | log_code,partner_id                 | log_code      | 2       | log_codes.log_code       | 1811 | Using where                                  |
|  1 | SIMPLE      | partners      | const  | PRIMARY                             | PRIMARY       | 2       | const                    |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table1 | eq_ref | PRIMARY                             | PRIMARY       | 1       | log_entries.t1_id        |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table2 | eq_ref | PRIMARY                             | PRIMARY       | 1       | log_entries.t2_id        |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table3 | eq_ref | PRIMARY                             | PRIMARY       | 3       | log_entries.t3_id        |    1 |                                              |
|  1 | SIMPLE      | joined_table4 | eq_ref | PRIMARY                             | PRIMARY       | 3       | joined_table3.t4_id      |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table5 | eq_ref | PRIMARY                             | PRIMARY       | 4       | log_entries.t5_id        |    1 | Using index                                  |
|  1 | SIMPLE      | joined_table6 | eq_ref | PRIMARY                             | PRIMARY       | 4       | log_entries.t6_id        |    1 | Using index                                  |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+

Don't know if it's a good or bad idea, but a subquery seems to get rid of the "Using temporary". Here is EXPLAIN output of two common scenarios. This query returns 66818 rows:

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type        | table       | type            | possible_keys                       | key        | key_len | ref                  | rows   | Extra                       |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
|  1 | PRIMARY            | log_entries | ref             | log_code,partner_id                 | partner_id | 2       | const                | 156110 | Using where; Using filesort |
|  1 | PRIMARY            | log_codes   | eq_ref          | PRIMARY,code_overview               | PRIMARY    | 2       | log_entries.log_code |      1 |                             |
|  2 | DEPENDENT SUBQUERY | log_codes   | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY    | 2       | func                 |      1 | Using where                 |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+

And a overview on customer, query returns 12 rows:

EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_entries.customer_id = 10000
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| id | select_type        | table       | type            | possible_keys                                    | key          | key_len | ref                  | rows | Extra                       |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
|  1 | PRIMARY            | log_entries | ref             | log_code,partner_id,customer_id,customer_partner | customer_id  | 4       | const                |   27 | Using where; Using filesort |
|  1 | PRIMARY            | log_codes   | eq_ref          | PRIMARY,code_overview                            | PRIMARY      | 2       | log_entries.log_code |    1 |                             |
|  2 | DEPENDENT SUBQUERY | log_codes   | unique_subquery | PRIMARY,code_overview,overview_code              | PRIMARY      | 2       | func                 |    1 | Using where                 |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
2
  • The correct structure of the composite indices will depend on cardinality. If space is not an issue then you will probably be better off with multiple indices to best serve these different queries. Please post the output from SHOW INDEXES after adding a single column index to each of the related fields? Commented Mar 3, 2012 at 21:27
  • Thanks! I added single column indexes to the relevant fields (thanks @Shiplu). Output of SHOW INDEXES added to my question. The category_overview index is used when included when testing a WHERE category_overview=1 query. But on SHOW EXPLAIN, I get "Using where; Using temporary; Using filesort" on the log_codes table, seems to be a major bottleneck. Commented Mar 3, 2012 at 23:52

2 Answers 2

2

There isn't a simple rule for guaranteed success when it comes to indexing - you need to look at a reasonable period of typical calls to work out what will help in terms of performance.

All subsequent comments are therefore to be taken not as absolute rules:

An index is "good" if it quickly gets you to a small subset of the data rather than if it eliminates only half of the data (e.g. there is rarely value in an index on a gender column where there are only M/F as the possible entries). So how unique are the values within e.g. log_code, category_overview and partner_id?

For a given query it is often helpful to have a "covering" index, that is one that includes all the fields that are used by the query - however, if there are too many fields from a single table in a query you instead want an index that includes the fields in the "where" or "join" clause to identify the row and then join back to the table storage to get all the fields required.

So given the information you've provided, a candidate index on log_codes would include log_code and category_overview. Similarly on log_entries for log_code and partner_id. However these would need to be evaluated for how they affect performance.

Bear in mind that any given index may improve the read performance of a single query retrieving data but it will also slow down writes to the table where there is then a requirement to write more information i.e. where the new row fits in the additional index. This is why you need to look at the big picture of activity on the database to determine where indexes are worth it.

Sign up to request clarification or add additional context in comments.

1 Comment

Great reply! log_codes: 100 rows, log_entries: 150000 rows. 5 different partner_ids. 5 differenct category_XXXX columns in log_codes. category_overview: 30 codes, 60000 entries. Default log view: category_overview=1 and partner_id set. Default log view on a customer: customer_id set as well. Forgot to mention ORDER BY logentry_id DESC. Tried an index on log_entries (partner_id, log_code, logentry_id). With or without index on log_codes (log_code, category_overview) i get "Using where; Using temporary; Using filesort" on log_codes, and "Copying To Tmp Table" takes 1900ms, 90% of the query time.
1

Well done for taking the time to update your question with the detail requested. I am sorry if that sounds patronising but it is amazing the number people who are not prepared to take the time to help themselves.

Adding a composite index across (customer_id, partner_id) on the log_entries table should give a significant benefit for the last of your example where clauses.

The output of your SHOW INDEXES for the log_codes table would suggest that it is not currently populated as it shows NULL for all but the PK. Is this the case?

EDIT Sorry. Just read your comment to KAJ's answer detailing table content. It might be worth running that SHOW INDEXES statement again as it looks like MySQL may have been building its stats.

Adding a composite index across (log_code, category_overview) for the log_codes table should help but you will need to check the explain output to see if it is being used.

As a very crude general rule you want to create composite indices starting with the columns with the highest cardinality but this is not always the case. It will depend heavily on data distribution and query structure.

UPDATE I have created a mockup of your dataset and added the following indices. They give significant improvement based on your sample WHERE clauses -

ALTER TABLE `log_codes`
    ADD INDEX `IX_overview_code` (`category_overview`, `log_code`);

ALTER TABLE `log_entries`
    ADD INDEX `IX_partner_code` (`partner_id`, `log_code`),
    ADD INDEX `IX_customer_partner_code` (`customer_id`, `partner_id`, `log_code`);

The last index is quite expensive in terms of disk space and degradation of insert performance but gives very fast SELECT based on your final WHERE clause example. My sample dataset has just over 1M records in the log_entries table with quite even distribution across the partner and customer IDs. Three of your sample WHERE clauses execute in less than a second but the one with category_overview as the only criterion is very slow although still sub-second with only 200k rows.

9 Comments

Yup, ran a SHOW INDEXES again and got cardinality 1 on all the categories now (values always 1 or 0). Also discovered that if I remove the ORDER BY, it does not do "Using temporary" on log_code, and the query executes on 0.0023ms instead of 2s...! Any way of resolving this? Even though it's crude, that general rule will be helpful. I just started learning indexes, and thought that the ones with low cardinality should go first [eg. (partner_id, customer_id)]...
Added a composite index across (log_code, category_overview). It shows up in @possible_keys@, but no keys used in @key@. Looks like a compisite index across (customer_id, partner_id) is never used; it uses (customer_id) when both customer_id and partner_id is specified in WHERE, and it uses (partner_id) when only partner_id is specified in WHERE. Should i remove the single indexes?
If index cardinality is very low (<6, slightly arbitrary as it does depend on the data set) then it will only ever be an insert overhead. That said, a composite index on multiple fields of low cardinality can offer significant benefits in some circumstances. As @KAJ pointed out earlier there are no simple rules. A combination of crude general rules, experience and experimentation will lead to the best outcome.
It probably won't help but can you post the output from PROCEDURE ANALYSE() on these two tables?
Thanks for you patience, nnichols! I have 7 LEFT JOINs that I've omitted from my question to ease readability, as I tought these would not affect the WHERE/ORDER BY. Example LEFT JOIN customer_status_codes ON log_entries.new_customer_status_code = customer_status_codes.customer_status_code. Seems like if I have maximum 4-5 of these, the "Using temporary" on log_codes disappears and the query executes fast. It does not look like it's one specific join that is causing the problem. EXPLAIN shows that PRIMARY key is used and rows=1 in all the joined tables.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.