2

I am using MS SQL server 2005

I have a table with 3 columns where I store user-message mapping like:

msg_for msg_from msg_id 
bob     bob      1 
bob     john     1 
bob     steve    1 
bob     bob      2 
bob     john     2 
bob     bob      3 
bob     john     3 
bob     steve    3

The PK is on 3 columns and msg_id is FK to messages table that stores the messages

The above is the physical storage I see according to the PK on 3 columns

Now my query MUST return the messages for a given user having latest msg at top (order by msg_id DESC)

bob john  3
bob steve 3
bob john  2
bob steve 2
bob john  1
bob steve 1

This mapping table has millions of rows. I see 95% of the cost is to SORT the result.

Is it possible to have the PK or some other way store data physically like this (avoid SORT)?

msg_for msg_from msg_id
bob     bob      3
bob     john     3
bob     steve    3
bob     bob      2
bob     john     2
bob     bob      1
bob     john     1
bob     steve    1

Thanks

2
  • Are you saying msg_for and msg_from columns are of varchar type? Commented Sep 3, 2010 at 19:52
  • msg_for and msg_from columns are varchar Commented Sep 3, 2010 at 19:56

4 Answers 4

4

Yes.

When you set up the Primary Key (or any index) you can define this

ALTER TABLE dbo.[Messages] ADD CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED 
(
    msg_for ASC, msg_from ASC, msg_id DESC
)

SQL Server can scan in either direction so it only makes sense if you want to control the sort order combination for multiple columns.

Edit: You say in the comments that the problem query is

select top 10 msg_id 
from message_user 
where msg_for = @user_name 
and msg_from <> @user_name 
order by msg_id DESC

The issue here isn't one of Ascending, Descending.

To give an analogy. Phone books are listed in surname, forename order but if you needed to know the lexicographically last 10 forenames in the directory you would need to scan the whole book. This would be unavoidable regardless of whether or not within each section forenames were listed in ascending or descending order.

Similarly the composite index keys would need to be msg_for, msg_id, msg_from to satisfy this query optimally not msg_for, msg_from, msg_id With this latter order it will still need to scan the whole section of the index satisfying the msg_for = @user_name criteria as it cannot know if there will be a later msg_id still to come belonging to a later msg_from Additionally regardless in which direction msg_id is sorted in their individual sub sections a sequential scan of the msg_for = @user_name part of the index will still require a sort as it they are fragmented by being in subsections according to msg_from.

Sign up to request clarification or add additional context in comments.

10 Comments

Are there any cons of this alteration, if any?
If you are going to alter the clustered PK on a large table you will need a maintenance window to do this in. You'd have to consider adverse effects on other queries that rely on the existing order. Can you post the actual query with this 'sort' operator?
This is the query that causes 95% cost on sorting: select top 10 msg_id from message_user where msg_for = @user_name and msg_from <> @user_name order by msg_id DESC Other queries: All of my queries on this table MUST return rows with msg_id in DESC order
@Projapati - The issue here isn't one of Ascending, Descending. It is that the composite index keys would need to be msg_for, msg_id, msg_from to satisfy this query optimally not msg_for, msg_from, msg_id
Even so, the fact that the key columns are of varchar type has a large impact on performance. Even if the index would have the columns in the right order it's very resource intensive for the DBMS engine to keep it sorted with every table change because it needs to sort strings.
|
3

The only way to guarantee order in the result set is by using an ORDER BY.

In SQL Server, a clustered index can help... assuming the optimizer sees the index as being useful.

Comments

1

Well no wonder sorting takes forever. Varchar/string types are usually types that are very heavy when it comes to sorting, whether it's SQL or any programming language for that matter. Whenever possible use integral types for such things.

I suggest you use integral values to identify members. Have a Members table (MemberId INT, MemberName VARCHAR, etc), then a Messages table (MessageId INT, MessageBody VARCHAR, etc) and then have a join table, say Correspondence with (SenderMemberId INT, RecipientMemberId INT, MessageId INT). Sorting on integral values will be way faster this way.

I think you can easily refactor your data to fit on such a new structure.

2 Comments

That will force me to join another table to see who is the user for that given id. I want to avoid join.
Joins are not really something to run away from. Of course having too many of them can have downsides but in the majority of cases they help a lot. Storing names as keys is an awful waste of storage space, not to mention redundancy (what if a member wishes to change his name?) Do a benchmark, see how the performance of the two methods compare.
0

Depending on your DBMS, you can use a clustered index to achieve that.

3 Comments

As I said I have a PK on 3 columns. So I have the clusted index already
But the clusted index stores rows in msg_id in 1,2,3..not 3,2,1
When creating the index, you can in most DBMS add a DESC after a column to note descending instead of ascending order, and you can state the column order as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.