3

My database has one very large table with over 2 billion rows with 3 columns. Id(uniqueidentity), Type(int, between 0-10. 0 = most used. 10 = least used), Data(Binary data between 1-10MB)

What are some ways I can optimize this database? (primarily select queries)

*Note: I might add a few more columns to this table later (eg: location, date...)

5
  • What version and edition are you using? Some ideas would be enterprise edition only. Commented Dec 8, 2010 at 23:49
  • Can you provide some kinds of examples on how you query this data? By type? By ID? Commented Dec 9, 2010 at 0:11
  • Select * from DataSource where Id = ... Commented Dec 9, 2010 at 0:19
  • Type a number between 0-10 (most - least) that represents is the likelihood of selecting that row. Commented Dec 9, 2010 at 0:22
  • If you only need to show Id and Type (e.g. in a list), avoid using SELECT *.... - this will always select everything, including your 10 MB of data..... use SELECT ID, Type FROM ... - that alone should speed up those kind of queries (for e.g. a list) by orders of magnitude! Commented Dec 9, 2010 at 6:43

2 Answers 2

5

Assuming that the id column is the clustered index key, and assuming that by uniqueidentity you mean uniqueidentifier:

  • do you need the uniqueidentifier type? Why?
  • What other alternatives have you considered?
  • Do you populate the data using sequential GUIDs or not?

GUIDs are a notoriously poor choise for clustered keys. See GUIDs as PRIMARY KEYs and/or the clustering key for a more detailed discussion:

But, a GUID that is not sequential - like one that has it's values generated in the client (using .NET) OR generated by the newid() function (in SQL Server) can be a horribly bad choice - primarily because of the fragmentation that it creates in the base table but also because of its size. It's unnecessarily wide (it's 4 times wider than an int-based identity - which can give you 2 billion (really, 4 billion) unique rows). And, if you need more than 2 billion you can always go with a bigint (8-byte int) and get 2^63-1 rows

Also read Disk space is cheap...That's not the point! as a follow up.

Other than this, you need to do your homework and post the required details for such a question: exact table and index definition, prevalent data access pattern (by key, by range, filters sort order, joins etc etc).

Have you done any work to identify problems so far? If not, start with Waits and Queues, a proven methodology to identify performance bottlenecks. Once you measure and find places that need improvement, we can advise how to improve.

Sign up to request clarification or add additional context in comments.

2 Comments

+1. Like the Tripp link: "Disk space is cheap...That's not the point!"
+1 gotta love Kim Tripp's insights! GUID's as Clustering Key should be prohibited by SQL Server itself.....
1
  • Add an Index(es). Decide which column(s) are the most appropriate clustered index.

  • Decide if storing 10MB of binary data in each (otherwise small) row is a good use of a database

[Updated in response to Remus's comment]

1 Comment

There are very few scenarios where paritioning can benefit performance, and they almost always revolve around switch-in switch-out data transfer for ETL or for retention/archiving. In general, partitioning will hurt performance. If you think at partition elimination: anything partitioning can do, and index ca do better. choosing a proper cluster index will run circles around partitioning any time from a performance POV. My 2c.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.