I am running an aggregate query that is taking much longer than expected. The query is from a single table without joins. The where clause includes a date range, an in clause, and a date column. There are only about 5k rows in the table, and the query time is 13s.
The query is:
select `site_id`, created_year_month_idx as time_column, count(*) as total
from `patients`
where `created_year_month_idx` between 20080101 and 20090101 and
`site_id` in (1,2,3) and
`patients`.`deleted_at` is null
group by `created_year_month_idx`, `site_id`
When I explain the query, it seems to be doing a whole table scan:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| --- | ----------- | -------- | ---------- | ----- | --------------------------------------------- | ------------------------------------- | ------- | --- | ---- | -------- | -------------------------------------------- |
| 1 | SIMPLE | patients | | range | site_id,patients_created_year_month_idx_index | patients_created_year_month_idx_index | 4 | | 1 | 100 | Using where; Using temporary; Using filesort |
The table create statements are:
CREATE TABLE `sites` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(10),
PRIMARY KEY (`id`)
);
CREATE TABLE `patients` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`site_id` int(10) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`deleted_at` timestamp NULL DEFAULT NULL,
`created_year_month_idx` date GENERATED ALWAYS AS (date_format(`created_at`,'%Y-%m-01')) VIRTUAL,
PRIMARY KEY (`id`),
KEY `site_id` (`site_id`),
KEY `patients_created_year_month_idx_index` (`created_year_month_idx`),
CONSTRAINT `patients_site` FOREIGN KEY (`site_id`) REFERENCES `sites` (`id`)
);
I created a DB Fiddle at https://www.db-fiddle.com/f/4zbjFpMYXEGSviprQcaTm3/0
(incidentally, if you can tell me how to format a markdown table on SO, I'll fix the above)
(site_id,created_year_month_idx), optionally includingdeleted_at, seems sensible. Incidentally, it's often as quick to try these things for yourself as ask us! But +1 for providing required infodeleted_atwill be null most for most records and will be a date for those records marked for deletion. About 10% of records will have a value fordeleted_atand 90% will be null.site_idand also oncreated_year_month_idx. Initially I was concerned thatcreated_year_month_idxwould slow things down as it is a generated (not stored) column, but I read that creating the index would store the calculated values and therefore not require a table scan. Are you saying I should combine the indices into one index?created_atshould never be null.