0

I have a many to many link table CategoryProduct with 2 columns, which will have multi-million records:

CREATE TABLE [dbo].[CategoryProduct](
[Category_ID] [int] NOT NULL,
[Product_ID] [int] NOT NULL,
CONSTRAINT [PK_dbo.CategoryProduct] PRIMARY KEY CLUSTERED 
(
    [Category_ID] ASC,
    [Product_ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Based on the clustered index, I expected to see the physical records to be stored in the following structure:

CategoryID    ProductID
1             2
1             3
2             1
2             3

However, the result with Select is

CategoryID    ProductID
2             1
1             2
1             3
2             3

Why is data stored in group of ProductID? Does this reflect the actual order of data? How can I save data in group of CategoryID so that a query like below can be optimised with a consecutive read when a matched CategoryID is hit.

select ProductID from CategoryProduct where CategoryID = value
4
  • Why don't you sort by category id ? Commented Nov 10, 2013 at 8:39
  • You just want to order by category ID or something else ? IF you just want to order by category Id then why wouldn't you use the order by Keyword at the end of your select statement in SQL ? Commented Nov 10, 2013 at 9:06
  • @DoobyInc I need to group the records together in Category order which can improve the performance of the query in the question. Commented Nov 10, 2013 at 12:59
  • 1
    Please script out the CREATE TABLE including clustered index definition and add it to your question. Also show the execution plan you get for select ProductID from CategoryProduct where CategoryID = value Commented Nov 10, 2013 at 13:41

2 Answers 2

3

When Sql Server fetch data by doing table scan or clustered index scan (if your table is clustered), it may choose to follow the leaf pages chaining because of search args, lock hints and other parameters, or it may follow the index allocation map that in most cases is not in the same order due to pages splits that occured.

Using a clustered index is not a guarantee of speed, Sql server computes different way to retrieve data for each request, even for simple requests (the Sql Query optimizer is a very complex system).

It is not a way to get data in a specific order either, the only way to get data in a specific order is to specify an ORDER BY clause in your query (this is an ANSI specification).

If you want to improve performance, you should study the query plan of your request. There are several ways to get the query plan of your request, the simplest one is to select the "include actual query plan" button in Sql Magenement Studio toolbar before executing your request.

Followup: with a clustered index, data is physically stored in the order of the cluster definition, until the cluster gets fragmented. The ONLY way to get data in a specific order in a SELECT is to add an ORDER BY clause to the SELECT, not creating indexes.

Sign up to request clarification or add additional context in comments.

4 Comments

the reason that I want to store the data in group of CategoryID is to have a consecutive read when a matched CategoryID is hit. Assuming CategoryID 1 has 2 products with ProductID 1 and 10000000, if the data is stored in group of CategoryID, I just need to read the first 2 records to get the result.
@mortdale, you are perfectly right to cluster your data on CategoryId , ProductId. The point is that creating a cluster index on those fields does not garantee you to get them in that order if you don't add a ORDER BY clause to your SELECT, even if they are "stored on that order".
Does that mean the data is physically saved in accordance to the clustered index? It's just my select statement not reflecting the actual order of data?
yes, data is stored in clustered index order until fragmentation breaks this physical order.
1

You should not rely on the clustered key for the ordering of the data. It is stored on the disk in the order of the clustered key but it does not mean that the returned data is guaranteed to be returned in any order. If you need to have your data ordered, you need to use ORDER BY clause.

Your query will be fine in terms of it's usage of the index. Ordering of the data is not the way to verify it, anyway. You should execute your query, check the execution plan and verify that the index is indeed used.

2 Comments

my question is about how to improve the performance by using the clustered index. However the clustered index I created didn't give the structure I wanted.
In what way it didn't?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.