SQL Server query performance on clustered index with composite fields

Question

I have a many to many link table CategoryProduct with 2 columns, which will have multi-million records:

CREATE TABLE [dbo].[CategoryProduct](
[Category_ID] [int] NOT NULL,
[Product_ID] [int] NOT NULL,
CONSTRAINT [PK_dbo.CategoryProduct] PRIMARY KEY CLUSTERED 
(
    [Category_ID] ASC,
    [Product_ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Based on the clustered index, I expected to see the physical records to be stored in the following structure:

CategoryID    ProductID
1             2
1             3
2             1
2             3

However, the result with Select is

CategoryID    ProductID
2             1
1             2
1             3
2             3

Why is data stored in group of ProductID? Does this reflect the actual order of data? How can I save data in group of CategoryID so that a query like below can be optimised with a consecutive read when a matched CategoryID is hit.

select ProductID from CategoryProduct where CategoryID = value

You just want to order by category ID or something else ? IF you just want to order by category Id then why wouldn't you use the order by Keyword at the end of your select statement in SQL ? — Kas
– Kas, Commented Nov 10, 2013 at 9:06
@DoobyInc I need to group the records together in Category order which can improve the performance of the query in the question. — mortdale
– mortdale, Commented Nov 10, 2013 at 12:59
Please script out the CREATE TABLE including clustered index definition and add it to your question. Also show the execution plan you get for select ProductID from CategoryProduct where CategoryID = value — Martin Smith
– Martin Smith, Commented Nov 10, 2013 at 13:41

ARA · Accepted Answer · 2013-11-13 09:40:50Z

3

When Sql Server fetch data by doing table scan or clustered index scan (if your table is clustered), it may choose to follow the leaf pages chaining because of search args, lock hints and other parameters, or it may follow the index allocation map that in most cases is not in the same order due to pages splits that occured.

Using a clustered index is not a guarantee of speed, Sql server computes different way to retrieve data for each request, even for simple requests (the Sql Query optimizer is a very complex system).

It is not a way to get data in a specific order either, the only way to get data in a specific order is to specify an ORDER BY clause in your query (this is an ANSI specification).

If you want to improve performance, you should study the query plan of your request. There are several ways to get the query plan of your request, the simplest one is to select the "include actual query plan" button in Sql Magenement Studio toolbar before executing your request.

Followup: with a clustered index, data is physically stored in the order of the cluster definition, until the cluster gets fragmented. The ONLY way to get data in a specific order in a SELECT is to add an ORDER BY clause to the SELECT, not creating indexes.

edited Nov 13, 2013 at 9:40

answered Nov 10, 2013 at 13:57

ARA

1,31611 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

mortdale Over a year ago

the reason that I want to store the data in group of CategoryID is to have a consecutive read when a matched CategoryID is hit. Assuming CategoryID 1 has 2 products with ProductID 1 and 10000000, if the data is stored in group of CategoryID, I just need to read the first 2 records to get the result.

ARA Over a year ago

@mortdale, you are perfectly right to cluster your data on CategoryId , ProductId. The point is that creating a cluster index on those fields does not garantee you to get them in that order if you don't add a ORDER BY clause to your SELECT, even if they are "stored on that order".

mortdale Over a year ago

Does that mean the data is physically saved in accordance to the clustered index? It's just my select statement not reflecting the actual order of data?

ARA Over a year ago

yes, data is stored in clustered index order until fragmentation breaks this physical order.

Szymon · Accepted Answer · 2013-11-10 11:02:34Z

1

You should not rely on the clustered key for the ordering of the data. It is stored on the disk in the order of the clustered key but it does not mean that the returned data is guaranteed to be returned in any order. If you need to have your data ordered, you need to use ORDER BY clause.

Your query will be fine in terms of it's usage of the index. Ordering of the data is not the way to verify it, anyway. You should execute your query, check the execution plan and verify that the index is indeed used.

answered Nov 10, 2013 at 11:02

Szymon

43k16 gold badges99 silver badges115 bronze badges

2 Comments

mortdale Over a year ago

my question is about how to improve the performance by using the clustered index. However the clustered index I created didn't give the structure I wanted.

Szymon Over a year ago

In what way it didn't?

Collectives™ on Stack Overflow

SQL Server query performance on clustered index with composite fields

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related