1

I try to understand how query optimization works. I executed the following queries:

Query #1

SELECT * 
FROM Product 
WHERE price < 50000 
ORDER BY id DESC

Query #2

SELECT * 
FROM Product 
WHERE price > 50000 
ORDER BY id DESC

So in the execution plans for each query I noticed that optimizer uses clustered index scan for the 1st query, however for the second query, first sort then nested loops are used.

What is the logic behind those execution plans?

4
  • 1
    The two queries are the same, so I would expect them to have the same execution plan. Further, "nested loops" typically refers to joins and neither query has any joins (assuming Product is a table and not a view). Commented Mar 7, 2020 at 13:08
  • With identical queries, it may be stats were updated between executions, resulting in different plans. Commented Mar 7, 2020 at 13:10
  • I corrected mistake in the query Commented Mar 7, 2020 at 13:11
  • They are not the same. They are 2 sides of a comparison and statistics may say that the first one has a lot less returnes data than the ssecond. Commented Mar 7, 2020 at 15:40

2 Answers 2

1

What is the logic behind those execution plans?

It is trying to find the fastest way to execute a query using indices AND THE STATISTICS THEREOF.

In above example. If you have 10 billion prices (assuming 1 to 10 billion) the first one asks for the first 50k, the second one asks for 10 billion MINUS 50k return values. Big differences.

Every index keeps a statistics of value distributions and SQL server uses that to estimate how expensive a particular way of operation will be. It then chooses the most efficient one.

Sign up to request clarification or add additional context in comments.

Comments

1

Assuming clustered index on id and secondary index on price, let's think about how a query of this type can be executed:

  1. Either: Retrieve the correct subset of rows first (using range scan on price index), and then sort the result on id DESC.
  2. Or: Scan the entire table in the reverse clustered index order (same as id DESC), and filter-out unsuitable rows as you do. The scan produces a correctly ordered result, and the order is not disturbed by removing some rows, so no separate sort is needed.

Which of these two the optimizer chooses depends on the relative cost of retrieving a subset of rows then sorting vs. retrieving all rows and filtering, and TomTom already made some good points about that.


Let's now consider why nested loops are needed in the case (1)...

The index on price can only be used to retrieve price (because it is explicitly indexed) and id (implicitly included in every secondary index).

So if you ask for any additional fields (as you did through *), then they must be looked-up from the clustered index. In other words, constructing the full row requires not just one, but two physical retrievals (first from secondary then clustered index), hence nested loops.

The execution then goes like this:

  1. [outer loop iteration] Seek the first row (satisfying the condition) in the price index. The id is retrieved together with price.
  2. [inner loop iteration] Fetch the row with that id from the clustered index.
  3. [outer loop iteration] Move to the next row in the index on price, and get its id.
  4. [inner loop iteration] Fetch the row with that id from the clustered index.
  5. Etc...

To avoid the nested loop, try:

SELECT id, price
FROM Product 
WHERE price > 50000 
ORDER BY id DESC

Since the index on price covers both price (explicitly) and id (implicitly), it alone can satisfy the query, and there is no need for the clustered index lookup.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.