4

I've been messing around with query performance for a system with pagination to make the data selection as fast as possible, but I've come across something I don't quite understand. To my knowledge, when a limit with an offset is used, MySQL has to iterate through each row before the offset and then discard them, so in theory a query with an offset of 10,000 would be much slower than one without, which is normally true as in this case

select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()' 
    order by `DateTimeAdded` desc limit 0, 100;
/* finishes in 2.497 seconds */

 select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()' 
   order by `DateTimeAdded` desc limit 10000, 100;
 /* finishes in 2.702 seconds */

But, if I use an inner join to join the table to itself with only UserID column for doing the sorting and limiting, it's consistently faster with the offset of 10,000 than without one, which completely stumps me. Example here would be

select SQL_NO_CACHE * from `customers` 
    inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()' 
        order by `DateTimeAdded` desc limit 100) 
    as `Results` using(`UserID`)
/* finishes in 1.133 seconds */

select SQL_NO_CACHE * from `customers` 
    inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()' 
        order by `DateTimeAdded` desc limit 10000, 100) 
    as `Results` using(`UserID`)
/* finishes in 1.120 seconds */

Why is the query using the offset always faster than the query without the offset?


Explains:

I have posted a Google Docs spreadsheet here with the explains content here

Note: The tests above were done in PHP looping 20 times each

Note2: customers is a view, not a base table

13
  • 1
    try different offsets, see if u get same trend. might be this specific offset has a very simple join Commented Jun 4, 2015 at 14:21
  • I have, if I do this with 30,000 even, it's still consistently faster than the query without the offset Commented Jun 4, 2015 at 14:22
  • optimize the table and see if it is the same (neutrlize all unknown factors first) Commented Jun 4, 2015 at 14:23
  • @Brian Leishman : I wonder whether you always run those 2 queries the same order for tests. Commented Jun 4, 2015 at 14:23
  • Also thought of that @a1ex07, if I switch the order of the queries, the one with the offset still wins Commented Jun 4, 2015 at 14:24

1 Answer 1

1

Case 1: The optimizer can use an index on the ORDER BY. LIMIT 10 will be faster than LIMIT 10000,10 because it can stop reading rows sooner.

Case 2: The optimizer cannot (or chooses not to) use an index for the ORDER BY. In this case, the entire set of rows (after WHERE) is collected, that set is sorted, and only then the OFFSET and LIMIT are applied. In this case the value of OFFSET makes little difference; most of the time was consumed fetching rows, filtering them, and sorting them.

INDEX(x,y)
SELECT ... WHERE x=2               ORDER BY y LIMIT ... -- case 1
SELECT ... WHERE x=2 AND deleted=0 ORDER BY y LIMIT ... -- case 2

INDEX(NetworkID, DateTimeAdded)         -- composite
SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 1

INDEX(NetworkID), INDEX(DateTimeAdded)  -- separate
SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 3

Case 3 might be like Case 1 because it might use INDEX(DateTimeAdded). Or, of the optimizer chooses to use the other index, then it is a slow Case 2. Anyway, it is not as good as using the composite index that can handle both the WHERE and the ORDER BY.

If you can manage to get to Case 1, I recommend you also "remember where you left off" to make Pagination even more efficient. See my Pagination blog.

More on creating INDEXes.

Sign up to request clarification or add additional context in comments.

3 Comments

Yes, I've read a ton about pagination and I've come across the strategies of "remembering where I left off", but none of which would really help me in my situation. Reason 1 being that this always requires that I have numerical data which I simply can't use because I have multiple servers that merge. Reason 2, is that I show a filtered subset of the data, that can be ordered by the user in any way, which means that the data is never in numerical order.
Have I at least explained why the OFFSET does not impact the speed? (Namely, that most of the time happened before applying the OFFSET.)
Yes, I do believe you have though explained why my queries seem odd, and thank you for that!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.