I have been assigned the task of creating some graphical stats for a website, out of some saved data.
Facts: - there are 3 databases in use. dbCurrent, dbStats, dbBackup. dbCurrent is the main database of the website dbStats hold various tables of statistics and tracking data dbBackup holds the last five years stats/tracking tables.
- the data i will use come from two databases ( dbStats, dbBackup )
- the table names are: stats2006, stats2007, stats2008, etc, except the current stats which is just "stats". Each table has data for its year.
- the table structure for each year of data is the same: primaryID field is integer productID field is integer dateMonitor field is integer (unixtimestamp) pageName field is varchar (20)
- productID, dateMonitor, pageName fields have also indexes
in other words, what product was viewed on what date and from what page.
So, what i thought is create a loop out of each table and get my data. Each query of mine looks like:
Select COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions
where $tablename and $conditions are variables based on each loop. All conditions are similar to:
- dateMonitor between date1 and date2
- pageName='some val'
- productID IN ($comma_separated_values)
- combination of the above
All of these are working decently so far (for a single product).
When i try to create a report to compare 'x' products in 'y' years (chosen dynamically from admin/moderator) the script runs for more than 15 minutes.
I am looking for a way to improve the performance of the script. Logic/structure i use so far, follows:
Loop through products to find the ids to use (typical format is: x,y,z (comma separated values)
Open Loop through years/months
Execute one sql query for each affected table/database to get the number of affected rows.
Close year loop
Send data to graph script (jquery jqPlot to be exact) to print on screen
Any help/idea appreciated.
EDIT: Based on @Narf suggestion's with UNION ALL, i constructed 1 single query based on 12 sub-select statements:
SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='1167606001' AND dateMonitor<='1170284399' AND dateMonitor='test'
UNION ALL
SELECT COUNT(*) AS monthlyTotal FROM db1.table2 WHERE dateMonitor>='1170284401' AND dateMonitor<='1172703599' AND dateMonitor='test' ...
Each select statement refers to a single month duration. Demo code:
for ($m=1; $m<=12; $m++)
{
$startDate = mktime(0, 0, 1, $m, 1, $myYear);
$daysOfMonth = date("t", mktime(10, 10, 10, $m, 10, $myYear));
$endDate = mktime(23, 59, 59, $m, $daysOfMonth, $myYear);
$query_chk1 .= "SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='$startDate' AND dateMonitor<='$endDate' AND pageName='test' UNION ALL ";
}
$query_chk1 = substr($query_chk1, 0, -10);
EDIT2: after creating combined indexes (as suggested by @ypercube), I see some slight decreased time in execution time.
Now i have an average execution time of 11 min (original time was 15-17 min)
This helped a lot to decrease execution time.
Thank you.
COUNT(*)is faster in MySQL, compared toCOUNT(field). And give same result, as long asfieldis not nullable.WHERE dateMonitor between date1 and date2 AND pageName='some val'would benefit from a(pageName, datemonitor)index.