Postgresql 9.x: Index to optimize `xpath_exists` (XMLEXISTS) queries

Question

We have queries of the form

select sum(acol)
where xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol)

What index can be built to speed up the where clause ?

A btree index created using

create index idx_01 using btree(xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol))

does not seem to be used at all.

EDIT

Setting enable_seqscan to off, the query using xpath_exists is much faster (one order of magnitude) and clearly shows using the corresponding index (the btree index built with xpath_exists).

Any clue why PostgreSQL would not be using the index and attempt a much slower sequential scan ?

Since I do not want to disable sequential scanning globally, I am back to square one and I am happily welcoming suggestions.

EDIT 2 - Explain plans

See below - Cost of first plan (seqscan off) is slightly higher but processing time much faster

b2box=# set enable_seqscan=off;
SET
b2box=# explain analyze
Select count(*) 
from B2HEAD.item
where cluster = 'B2BOX' and (  ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) )  )  offset 0 limit 1;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.042..606.042 rows=1 loops=1)
   ->  Aggregate  (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.039..606.039 rows=1 loops=1)
         ->  Bitmap Heap Scan on item  (cost=1058.65..22701.38 rows=26102 width=0) (actual time=3.290..603.823 rows=4085 loops=1)
               Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
               ->  Bitmap Index Scan on item_counter_01  (cost=0.00..1052.13 rows=56515 width=0) (actual time=2.283..2.283 rows=4085 loops=1)
                     Index Cond: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) = true)
 Total runtime: 606.136 ms
(7 rows)

plan on explain.depesz.com

b2box=# set enable_seqscan=on;
SET
b2box=# explain analyze
Select count(*) 
from B2HEAD.item
where cluster = 'B2BOX' and (  ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) )  )  offset 0 limit 1;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.163..10864.163 rows=1 loops=1)
   ->  Aggregate  (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.160..10864.160 rows=1 loops=1)
         ->  Seq Scan on item  (cost=0.00..22490.45 rows=26102 width=0) (actual time=33.574..10861.672 rows=4085 loops=1)
               Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
               Rows Removed by Filter: 108945
 Total runtime: 10864.242 ms
(6 rows)

plan on explain.depesz.com

Please post the explain plans.

Jakub Kania
– Jakub Kania

2013-04-18 11:14:54 +00:00
Commented Apr 18, 2013 at 11:14 — Jakub Kania
– Jakub Kania, Commented Apr 18, 2013 at 11:14
@JakubKania - Please see edit above

Bruno Grieder
– Bruno Grieder

2013-04-18 13:04:07 +00:00
Commented Apr 18, 2013 at 13:04 — Bruno Grieder
– Bruno Grieder, Commented Apr 18, 2013 at 13:04

Craig Ringer · Accepted Answer · 2013-04-19 01:29:58Z

4

Planner cost parameters

Cost of first plan (seqscan off) is slightly higher but processing time much faster

This tells me that your random_page_cost and seq_page_cost are probably wrong. You're likely on storage with fast random I/O - either because most of the database is cached in RAM or because you're using SSD, SAN with cache, or other storage where random I/O is inherently fast.

Try:

SET random_page_cost = 1;
SET seq_page_cost = 1.1;

to greatly reduce the cost param differences and then re-run. If that does the job consider changing those params in postgresql.conf..

Your row-count estimates are reasonable, so it doesn't look like a planner mis-estimation problem or a problem with bad table statistics.

Incorrect query

Your query is also incorrect. OFFSET 0 LIMIT 1 without an ORDER BY will produce unpredictable results unless you're guaranteed to have exactly one match, in which case the OFFSET ... LIMIT ... clauses are unnecessary and can be removed entirely.

You're usually much better off phrasing such queries as SELECT max(...) or SELECT min(...) where possible; PostgreSQL will tend to be able to use an index to just pluck off the desired value without doing an expensive table scan or an index scan and sort.

Tips

BTW, for future questions the PostgreSQL wiki has some good information in the performance category and a guide to asking Slow query questions.

edited Apr 19, 2013 at 1:29

answered Apr 19, 2013 at 1:24

Craig Ringer

329k83 gold badges742 silver badges820 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bruno Grieder Over a year ago

Great answer! Not only your suggested parameter adjustments work fine but you are also right in suspecting that we are using SSDs with plenty of RAM and that.... we are starting with PostgreSQL support. The queries are programmatically generated; we will improve the generator to incorporate your comments on offset and limit. Finally, many thanks for the useful links.

Collectives™ on Stack Overflow

Postgresql 9.x: Index to optimize `xpath_exists` (XMLEXISTS) queries

1 Answer 1

Planner cost parameters

Incorrect query

Tips

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Planner cost parameters

Incorrect query

Tips

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related