I'm doing the following two queries quite frequently on a table that essentially gathers up logging information. Both select distinct values from a huge number of rows but with less than 10 different values in those.
I've analyzed both "distinct" queries done by the page:
marchena=> explain select distinct auditrecor0_.bundle_id as col_0_0_ from audit_records auditrecor0_;
QUERY PLAN
----------------------------------------------------------------------------------------------
HashAggregate (cost=1070734.05..1070734.11 rows=6 width=21)
-> Seq Scan on audit_records auditrecor0_ (cost=0.00..1023050.24 rows=19073524 width=21)
(2 rows)
marchena=> explain select distinct auditrecor0_.server_name as col_0_0_ from audit_records auditrecor0_;
QUERY PLAN
----------------------------------------------------------------------------------------------
HashAggregate (cost=1070735.34..1070735.39 rows=5 width=13)
-> Seq Scan on audit_records auditrecor0_ (cost=0.00..1023051.47 rows=19073547 width=13)
(2 rows)
Both do sequence scans of the columns. However if I turn off enable_seqscan (dispite the name this only disables doing sequence scans on columns with indices) the query uses the index, but is even slower:
marchena=> set enable_seqscan = off;
SET
marchena=> explain select distinct auditrecor0_.bundle_id as col_0_0_ from audit_records auditrecor0_;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..19613740.62 rows=6 width=21)
-> Index Scan using audit_bundle_idx on audit_records auditrecor0_ (cost=0.00..19566056.69 rows=19073570 width=21)
(2 rows)
marchena=> explain select distinct auditrecor0_.server_name as col_0_0_ from audit_records auditrecor0_;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..45851449.96 rows=5 width=13)
-> Index Scan using audit_server_idx on audit_records auditrecor0_ (cost=0.00..45803766.04 rows=19073570 width=13)
(2 rows)
Both bundle_id and server_name columns have btree indices on them, should I be using a different type of index to make selecting distinct values fast?