I have one de-normalized table with 40+ columns (~ 1.5 million rows, 1 Gb).
CREATE TABLE tbl1 (
...
division_id integer,
division_name varchar(10),
...
);
I need to speed up query
SELECT DISTINCT division_name, division_id
FROM table
ORDER BY division_name;
Query return only ~250 rows, but very slow cause size of table.
I have tried to create index:
create index idx1 on tbl1 (division_name, division_id)
But current execution plan:
explain analyze SELECT Distinct division_name, division_id FROM tbl1 ORDER BY 1;
QUERY PLAN
-----------------------------------------------------------------
Sort (cost=143135.77..143197.64 rows=24748 width=74) (actual time=1925.697..1925.723 rows=294 loops=1)
Sort Key: division_name
Sort Method: quicksort Memory: 74kB
-> HashAggregate (cost=141082.30..141329.78 rows=24748 width=74) (actual time=1923.853..1923.974 rows=294 loops=1)
Group Key: division_name, division_id
-> Seq Scan on tbl1 (cost=0.00..132866.20 rows=1643220 width=74) (actual time=0.069..703.008 rows=1643220 loops=1)
Planning time: 0.311 ms
Execution time: 1925.883 ms
Any suggestion why index does not work or how I can speed up query in other way?
Server Postgresql 9.6.
p.s. Yes, table has 40+ columns and de-normalized, but I know all pros and cons for with decision.
Update1
@a_horse_with_no_name suggest to use vacuum analyze instead of analyze to update table statistic. Now query plain is:
QUERY PLAN
------------------------
Unique (cost=0.55..115753.43 rows=25208 width=74) (actual time=0.165..921.426 rows=294 loops=1)
-> Index Only Scan using idx1 on tbl1 (cost=0.55..107538.21 rows=1643044 width=74) (actual time=0.162..593.322 rows=1643220 loops=1)
Heap Fetches: 0
Much better!
vacuum analzyeon the table.division_id, division_name. Indexes are (by default unless explicitly defined otherwise) sorted ascending by each column, left-to-right. Integers are by nature easier to sort by, and a sorted data set is easier to get distinct values from.