I have a table (logs) that has the following columns (there are others, but these are the important ones):
- id (PK, int)
- Timestamp (datetime) (index)
- Duration (int)
Basically this is a record for an event that starts at a time and ends at a time. This table currently has a few hundred thousand rows in it. I expect it to grow to millions. For the purpose of speeding up queries, I have added another column and precomputed values:
- EndTime (datetime) (index)
To calculate EndTime I have added the number of seconds in Duration to the Timestamp field.
Now what I want to do is run a query where the result counts the number of rows where the start (Timestamp) and end times (EndTime) fall outside of a certain point in time. I then want to run this query for every second for a large timespan (such as a year). I would also like to count the number of rows that start on a particular point in time, and end at a particular point in time.
I have created the following query:
SELECT
`dates`.`date`,
COUNT(*) AS `total`,
SUM(IF(`dates`.`date`=`logs`.`Timestamp`, 1, 0)) AS `new`,
SUM(IF(`dates`.`date`=`logs`.`EndTime`, 1, 0)) AS `dropped`
FROM
`logs`,
(SELECT
DATE_ADD("2010-04-13 09:45:00", INTERVAL `number` SECOND) AS `date`
FROM numbers LIMIT 120) AS dates
WHERE dates.`date` BETWEEN `logs`.`Timestamp` AND `logs`.`EndTime`
GROUP BY `dates`.`date`;
Note that the numbers table is strictly for easily enumerating a date range. It is a table with one column, number, and contains the values 1, 2, 3, 4, 5, etc...
This gives me exactly what I am looking for... a table with 4 columns:
- date
- total (the total rows that start and end outside the current point in time)
- new (rows that start at this point in time)
- dropped (rows that end at this point in time)
The trouble is, this query can take a significant amount of time to execute. To go through 120 seconds (as shown in the query), it takes about 10 seconds. I suspect that this is about as fast as I am going to get it, but I thought I would ask here if anyone had any ideas for improving the performance of this query.
Any suggestions would be most helpful. Thank you for your time.
Edit: I have indexes on Timestamp and EndTime.
The output of EXPLAIN on my query:
"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"PRIMARY";"<derived2>";"ALL";NULL;NULL;NULL;NULL;"120";"Using temporary; Using filesort"
"1";"PRIMARY";"logs";"ALL";"Timestamp,EndTime";NULL;NULL;NULL;"296159";"Range checked for each record (index map: 0x6)"
"2";"DERIVED";"numbers";"index";NULL;"PRIMARY";"4";NULL;"35546940";"Using index"
When I run analyze on my logs table, it says status OK.
EXPLAINing your query? Have you analyzed thelogstable?SHOW WARNINGSafter anEXPLAIN EXTENDEDquery.