php mysql optimization

Question

I have been assigned the task of creating some graphical stats for a website, out of some saved data.

Facts: - there are 3 databases in use. dbCurrent, dbStats, dbBackup. dbCurrent is the main database of the website dbStats hold various tables of statistics and tracking data dbBackup holds the last five years stats/tracking tables.

the data i will use come from two databases ( dbStats, dbBackup )
the table names are: stats2006, stats2007, stats2008, etc, except the current stats which is just "stats". Each table has data for its year.
the table structure for each year of data is the same: primaryID field is integer productID field is integer dateMonitor field is integer (unixtimestamp) pageName field is varchar (20)
productID, dateMonitor, pageName fields have also indexes

in other words, what product was viewed on what date and from what page.

So, what i thought is create a loop out of each table and get my data. Each query of mine looks like:

Select COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions

where $tablename and $conditions are variables based on each loop. All conditions are similar to:

dateMonitor between date1 and date2
pageName='some val'
productID IN ($comma_separated_values)
combination of the above

All of these are working decently so far (for a single product).

When i try to create a report to compare 'x' products in 'y' years (chosen dynamically from admin/moderator) the script runs for more than 15 minutes.

I am looking for a way to improve the performance of the script. Logic/structure i use so far, follows:

Loop through products to find the ids to use (typical format is: x,y,z (comma separated values)
Open Loop through years/months
Execute one sql query for each affected table/database to get the number of affected rows.
Close year loop
Send data to graph script (jquery jqPlot to be exact) to print on screen

Any help/idea appreciated.

EDIT: Based on @Narf suggestion's with UNION ALL, i constructed 1 single query based on 12 sub-select statements:

SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='1167606001' AND dateMonitor<='1170284399' AND dateMonitor='test' 
UNION ALL 
SELECT COUNT(*) AS monthlyTotal FROM db1.table2 WHERE dateMonitor>='1170284401' AND dateMonitor<='1172703599' AND dateMonitor='test' ...

Each select statement refers to a single month duration. Demo code:

for ($m=1; $m<=12; $m++)
{
$startDate = mktime(0, 0, 1, $m, 1, $myYear);
$daysOfMonth = date("t", mktime(10, 10, 10, $m, 10, $myYear));
$endDate = mktime(23, 59, 59, $m, $daysOfMonth, $myYear);

$query_chk1 .= "SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='$startDate' AND dateMonitor<='$endDate' AND pageName='test' UNION ALL ";
}

$query_chk1 = substr($query_chk1, 0, -10);

EDIT2: after creating combined indexes (as suggested by @ypercube), I see some slight decreased time in execution time.

Now i have an average execution time of 11 min (original time was 15-17 min)

This helped a lot to decrease execution time.

Thank you.

And a side note: COUNT(*)is faster in MySQL, compared to COUNT(field). And give same result, as long as field is not nullable. — ypercubeᵀᴹ
– ypercubeᵀᴹ, Commented Aug 31, 2011 at 9:36
@ypercube: productID, dateMonitor, pageName fields are indexes in each table — andrew
– andrew, Commented Aug 31, 2011 at 9:55
For your queries that involve more than one conditions (in more than one fields), you'll benefit from compound indexes. For example, WHERE dateMonitor between date1 and date2 AND pageName='some val' would benefit from a (pageName, datemonitor) index. — ypercubeᵀᴹ
– ypercubeᵀᴹ, Commented Aug 31, 2011 at 11:43
Check these pages:dev.mysql.com/doc/refman/5.1/en/select-optimization.html — ypercubeᵀᴹ
– ypercubeᵀᴹ, Commented Aug 31, 2011 at 12:10

Narf · Accepted Answer · 2011-08-31 10:14:30Z

1

There's not much that you can do, at least since you've indexed all your columns ... here's the best that I can come up with:

SELECT COUNT(*)
FROM `stats`
WHERE `productID IN(1,2,3)
    AND `dateMonitor` >= <unixtime from>
    AND `dateMonitor` <= <unixtime to>
    AND `pageName`='<value>'

... and how:

As ypercube has commented - using COUNT(*) is faster.
I don't know this for sure, but I believe that using >= and <= instead of BETWEEN for integers should be faster.

Another thing that you should try is executing all the queries (if more than one) at once. It would be harder for me to explain it correctly in words, and I see that you have a good grasp of SQL, so you should be able to get the logic, so here's an example:

Let's say that we need to search for products with ids of 123, 13, 5 and 6 from May 2006 through April 2008, and pageName 'test':

We calculate the timestamps prior to generating the query and determine exactly which tables we need to search in.

SELECT COUNT(*) AS myCounter FROM stats2006 WHERE productID IN(5,6,13,123) AND dateMonitor >= 1146430800 AND pageName='test'

/* Here we only need to check the timestamp against May 1st 2006, 00:00:00 */

UNION ALL

SELECT COUNT(*) AS myCounter FROM stats2007 WHERE productID IN(5,6,13,123) AND pageName='test'

/* Here we don't need to check the dateMonitor field because the whole year matches our period */

UNION ALL

SELECT COUNT(*) AS myCounter FROM stats2008 WHERE productID IN(5,6,13,123) AND dateMonitor <= 1209589199 AND pageName='test'

/* Here we only need to check the timestamp against April 30th 2008, 23:59:59 */

answered Aug 31, 2011 at 10:14

Narf

14.8k3 gold badges40 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

andrew Over a year ago

thank you. I didn't knew about the UNION ALL syntax. From a quick look of your example, it seems that your sql will produce 3 results. Each one will equal the total affected rows of each select statement. Am i correct to assume this? I also assume that this approach will fail if i query for a single month? Since timestamps for each month in each year are different, the COUNT() of each select statement will result in null or 0. Is it safe to use different criteria in each WHERE statement?

Narf Over a year ago

Yes - it will produce 3 results, which is probably incorrect as I just now notice that you probably want the count for each product, but I assumed otherwise, because your select only contains a count. It depends on what exactly you want to query - if you want to get only data for e.g. August for each year - yes, you would need different timestamps, but the whole point of the UNION statement is that you can combine results for two different queries, so - yes, it is safe to use different critera in the WHERE statement. You can change everything as long as the produced colums are the same.

Ivan · Accepted Answer · 2011-08-31 10:15:30Z

0

When you so compare 'x' products in 'y' years why don't you use GROUP BY? Eg:

Select productID, COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions GROUP BY productID

this will cut amount of quires and should speed up the process.

answered Aug 31, 2011 at 10:15

Ivan

3,66720 silver badges25 bronze badges

1 Comment

andrew Over a year ago

i think group by will fail. I don't want to count each product, but to sum each one. For example: count views for 5 products in a certain page in a certain duration versus some other 5 products in another page in the same duration. At the current state of the website, we don't care for separate views but for the totals. At later stages, when we would like to check which product is more efficient, then we will group the views based on product/page to compare them.

Collectives™ on Stack Overflow

php mysql optimization

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related