1

Recent versions of Microsoft SQL Server allow creating a clustered columnstore index on a table that has computed columns, as long as they are not persisted computed columns. [1]

I would like to get the performance benefits of a clustered columnstore index, in particular to use segment elimination. But this isn't working on the computed column. To reproduce, create a table filled with some integers, and a computed column which happens to be always 0.

drop table if exists my_table
create table my_table (x int not null)
alter table my_table add is_odd as convert(bit, x % 2)
create clustered columnstore index cs on my_table

insert into my_table (x)
select value 
from generate_series(1, 10000000)
where value % 2 = 0

select distinct is_odd from my_table -- gives 0

Every row has is_odd=0 and so every rowgroup of the columnstore will have 0 as the min and max value of this column (if ineed the column is physically present in the columnstore). Segment elimination does work on the ordinary column:

set statistics io on
select 0 from my_table where x < 0

Table 'my_table'. Segment reads 0, segment skipped 5.

But when querying the computed column, it seems not to notice, and ends up scanning the whole table:

set statistics io on
select 0 from my_table where is_odd = 1

Table 'my_table'. Segment reads 5, segment skipped 0.

The query plan shows it scanning the whole table, then computing the is_odd value for each row and filtering it. It hasn't read is_odd directly from the columnstore. Is there any way I can get the columnstore to include this column and do segment elimination?

(By contrast if I create a rowstore index, clustered or nonclustered, having is_odd as a key column, then queries can seek directly and don't have to scan the whole table.)

2 Answers 2

1

My conclusion is that computed columns aren't physically stored in the columnstore, even when it's a clustered columnstore that "includes all columns". This differs from rowstore indexes, which will physically include a key column, even when it's a nonpersisted computed column in the underlying table. So you cannot expect a query speedup from segment elimination, run-length compression, and the other columnstore goodness for computed columns. They have to be recalculated each time.

I would be interested to see some official docs from Microsoft confirming this, and whether some future version may improve matters.

Sign up to request clarification or add additional context in comments.

2 Comments

You can see that sys.column_store_segments for your table only contains the "real" column, which should explain why it cannot eliminate segments
0

One workaround is to use an indexed view. You have to make a rowstore clustered index, but then you can create a nonclustered columnstore index.

drop view if exists my_view
go

create view my_view
with schemabinding as
select x, is_odd
from dbo.my_table
go

create unique clustered index idx on my_view (x)

create columnstore index cs on my_view (x, is_odd)
go

set statistics io on
select 0 from my_view with (noexpand) where is_odd = 1
go

Table 'my_view'. Segment reads 0, segment skipped 8.

Indeed, the is_odd column could be computed in the view definition rather than being a computed column from the table.

It's not ideal, since you have the overhead of maintaining the indexed view with its two indexes in addition to the underlying table (which will probably still need indexing for efficient updates and other queries). I have found that bulk copy operations can run much slower when an indexed view exists, I think because they have to become logged operations. (By contrast a clustered columnstore index on a table keeps bulk copy fast.)

Even on the pricey "Enterprise" version of MSSQL, the noexpand hint is still needed, otherwise it expands out the view definition and scans the underlying table. And the indexed view doesn't magically get used if you query the table directly. So this would not be a transparent speedup to queries.

4 Comments

Does having this workaround beat the normal boring old hat BTree-index on the computed column?
It can be much faster, yes. For the same reasons that a columnstore index often outperforms even a perfectly tuned rowstore index. But it's awkward to query (while an index is used automatically) and slow to update (since you have the B-tree based indexed view to maintain, and then a columnstore on top of that). So "it depends".
IMHO you should generally always use NOEXPAND against indexed views anyway, even in Enterprise Edition, see sqlperformance.com/2014/01/sql-plan/… and sqlperformance.com/2015/12/sql-performance/noexpand-hints (and caveats dba.stackexchange.com/questions/131748/…).
I agree. Omitting the noexpand hint is just a test to see if the planner is able to substitute the indexed view into a query that was against the underlying tables. (Since it works by expanding out the view definition and then later trying to see if any indexed views match.) In this case I couldn't observe any query where the "Enterprise" indexed view magic happened.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.