I have a query similar to:
SELECT
ANY_VALUE(name) AS `name`,
100 * SUM(score) / SUM(sum(score)) OVER (PARTITION BY date(scores.created_at)) AS `average_score`,
ANY_VALUE(DATE_FORMAT(scores.created_at, "%Y-%m-%d")) AS `shift_date`
FROM
`scores`
INNER JOIN `shifts` ON `shifts`.`id` = `scores`.`shift_id`
WHERE
`shifts`.`table_c_id` in(1, 2, 3, 4, 5, 6, 7, 8, 9, 10……)
AND date(`scores`.`created_at`) >= '2020-01-01'
GROUP BY
`name`,
date(scores.created_at)
ORDER BY
`shift_date` ASC;
The where in can be up to 2000 IDs which may not be sequential and the created_at where can be up to 14 months ago. Currently, at those levels, the execution time is 10-20 seconds.
I'm trying to optimise this. I've tried adding an index on created_at on the scores table but that had no effect. I also tried changing the date where clause to:
AND `scores`.`created_at` >= '2020-01-01 00:00:00
Which again made no difference.
Having read up on the topic, some recommended creating a temporary table but I can't see how this would have any benefit. I'm also not sure how to do this in one (is it even possible?) query.
The indexes on scores table are: shift_id, employee_id, name,created_at (used for another query). As I said, a created_at index didn't help this one.
The shifts table has indexes on table_c_id and created_at
Some sites suggest using WITH and CTEs, but again, I'm not sure how this would work or if the performance would actually improve.
The schema for scores and shifts is:
DROP TABLE IF EXISTS `scores`;
CREATE TABLE `scores` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`shift_id` int unsigned NOT NULL,
`hash` varchar(40) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`sscore` double(8,2) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL
PRIMARY KEY (`id`),
KEY `scores_hash_index` (`hash`) USING BTREE,
KEY `scores_shift_id_index` (`shift_id`) USING BTREE,
KEY `scores_name_created_at_index` (`name`,`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=3140922 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
DROP TABLE IF EXISTS `shifts`;
CREATE TABLE `shifts` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`table_c_id` int unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `shifts_table_c_id_index` (`table_c_id`),
KEY `shifts_created_at_index` (`created_at`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=536392 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Update
Using a lookup table for names:
names table: int unsigned, id, primary; varchar, name
SELECT
names.name AS `name`,
100 * SUM(score) / SUM(sum(score)) OVER (PARTITION BY date(scores.created_at)) AS `average_score`,
ANY_VALUE(DATE_FORMAT(scores.created_at, "%Y-%m-%d")) AS `shift_date`
FROM
`scores`
INNER JOIN `shifts` ON `shifts`.`id` = `scores`.`shift_id`
INNER JOIN `names` ON `names`.id = `scores`.`name_id`
WHERE
`shifts`.`table_c_id` in(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506)
AND `scores`.`created_at` >= '2019-04-03'
GROUP BY
`names`.`name`,
date(scores.created_at)
ORDER BY
`shift_date` ASC;
Has given no benefit. Also an index on scores table for shift_id, name_id and created_at hasn't helped.
SUM(sum(score))seems wrongBETWEENinstead ofIN.ANY_VALUE(DATE_FORMAT(scores.created_at, "%Y-%m-%d"))withdate(scores.created_at).