0

Using SQL Server 2008 R2

I'd like to create a table with the following columns

[id] INT IDENTITY(1,1) NOT NULL,
[user_id] INT NOT NULL,
[date] DATE NOT NULL,
[timestamp] DATETIME NOT NULL,
[xml_data] XML NOT NULL

with the primary key on the identity column and a non-clustered index on user_id and date that covers xml_data and timestamp.

However, I notice that I can't add xml_data to the INCLUDE statement in the index. Sad face, since that's going to result in an RID lookup when a user searches on user_id and date.

What's the best way to store xml data that will be queried?

I figure my choices are

  1. Stick with xml and have well-formatted data but take the query hit
  2. Use a VARCHAR(MAX) with unknown pros/cons
  3. Use a VARBINARY(MAX) with unknown pros/cons

Note: I doubt I'll be able to restrict the length of the string to even something like 8000.

2 Answers 2

3

If you have XML - store it as XML, for two main reasons:

  • it's optimized for XML storage - it's not stored as just plain text, it's actually tokenized and stored more efficiently than plain text

  • you can actually query the XML when it's stored as type XML

But: you cannot just index a XML column like that. Any index in SQL Server can be a maximum of 900 bytes long - an XML column could be up to 2 GB in size.

If you want to index your XML column, have a look at XML Indexes in SQL Server 2005 - it's a separate type of index designed to handle queries into XML very efficiently.

Another way to speed up your XML queries could be to "surface" certain pieces of your XML that you query on often onto the parent table, by means of a stored function that extract that piece of information from the XML, and stores it as a computed persisted column on the parent table. Once it's stored there, you can query it just like any other column, and you can index it, too! It only works for single pieces of information, however (e.g. the OrderNumber from your order - you only ever have one of those) - it can't be applied to collections of data.

Sign up to request clarification or add additional context in comments.

2 Comments

Very detailed answer, thank you much. So, basically, I'll have to eat the performance hit on the SELECT?
@Norla: just don't SELECT the XML column if you don't need it. Also: since it's stored in an optimized fashion - it'll actually be faster than storing as VARCHAR(MAX)
0

You can use XQuery to querying xml fields. See here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.