Postgresql. Optimize retriving distinct values from large table

Question

I have one de-normalized table with 40+ columns (~ 1.5 million rows, 1 Gb).

CREATE TABLE tbl1 (
  ...
  division_id integer,
  division_name varchar(10),
  ...
);

I need to speed up query

SELECT DISTINCT division_name, division_id 
FROM table 
ORDER BY division_name;

Query return only ~250 rows, but very slow cause size of table.

I have tried to create index:

create index idx1 on  tbl1 (division_name, division_id)

But current execution plan:

explain analyze SELECT Distinct division_name, division_id FROM tbl1 ORDER BY 1;

          QUERY PLAN                                                                    
-----------------------------------------------------------------
Sort  (cost=143135.77..143197.64 rows=24748 width=74) (actual time=1925.697..1925.723 rows=294 loops=1)
Sort Key: division_name
    Sort Method: quicksort  Memory: 74kB
    ->  HashAggregate  (cost=141082.30..141329.78 rows=24748 width=74) (actual time=1923.853..1923.974 rows=294 loops=1)
             Group Key: division_name, division_id
             ->  Seq Scan on tbl1  (cost=0.00..132866.20 rows=1643220 width=74) (actual time=0.069..703.008 rows=1643220 loops=1)
Planning time: 0.311 ms
Execution time: 1925.883 ms

Any suggestion why index does not work or how I can speed up query in other way?

Server Postgresql 9.6.

p.s. Yes, table has 40+ columns and de-normalized, but I know all pros and cons for with decision.

Update1

@a_horse_with_no_name suggest to use vacuum analyze instead of analyze to update table statistic. Now query plain is:

QUERY PLAN                                                                                
------------------------
 Unique  (cost=0.55..115753.43 rows=25208 width=74) (actual time=0.165..921.426 rows=294 loops=1)
   ->  Index Only Scan using idx1 on tbl1  (cost=0.55..107538.21 rows=1643044 width=74) (actual time=0.162..593.322 rows=1643220 loops=1)
         Heap Fetches: 0

Much better!

obvious question, but you didnot mention it - you gathered stats after index creation - right?.. — Vao Tsun
– Vao Tsun, Commented Nov 6, 2017 at 9:20
good suggestion, but "analyze tbl1;" does not help - I got same query plan. — potapuff
– potapuff, Commented Nov 6, 2017 at 9:25
hm. well - you can off the seqcan and see the plan with index - just to make sure it is cheaper — Vao Tsun
– Vao Tsun, Commented Nov 6, 2017 at 9:53
Hmm, that should use an index scan. Try a vacuum analzye on the table. — user330315
– user330315, Commented Nov 6, 2017 at 10:13
You may have better luck if you reverse the column order in your index to be division_id, division_name. Indexes are (by default unless explicitly defined otherwise) sorted ascending by each column, left-to-right. Integers are by nature easier to sort by, and a sorted data set is easier to get distinct values from. — Scoots
– Scoots, Commented Nov 6, 2017 at 10:58

Laurenz Albe · Accepted Answer · 2017-11-06 10:06:33Z

4

The index will probably only help if PostgreSQL chooses an “index only scan”, that means that it does not have to look at the table data at all.

Normally PostgreSQL has to check the table data (“heap”) to see if a row is visible for the current transaction, because visibility information is not stored in the index.

If, however, the table does not change much and has recently been VACUUMed, PostgreSQL knows that most of the pages consist only of items visible for everyone (there is a “visibility map” to keep track of that information), and then it might be cheaper to scan the index.

Try running VACUUM on the table and see if that causes an index only scan to be used.

Other than that, there is no way to speed up such a query.

answered Nov 6, 2017 at 10:06

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

potapuff Over a year ago

Yes, vacuum analzye helps.

Collectives™ on Stack Overflow

Postgresql. Optimize retriving distinct values from large table

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related