I've been working at trying to figure out how to get query planning to act a little smarter for a while now pretty unsuccessfully. I've messed around with work_mem and friends, run vacumm analyze plenty and tried altering the query with order by. I've included 3 runs of the same query with different offsets. I'm under the impression that this query is not nearly as performant as it could be. Any thoughts?
Just in case it doesn't jump out at you -- the only changes between the following queries is the offset
bloomapi=# explain analyze SELECT * FROM npis WHERE provider_last_name_legal_name = 'THOMPSON' offset 250 limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=965.13..998.97 rows=10 width=2589) (actual time=568.458..577.507 rows=10 loops=1)
-> Bitmap Heap Scan on npis (cost=119.15..20382.11 rows=5988 width=2589) (actual time=58.140..577.027 rows=260 loops=1)
Recheck Cond: ((provider_last_name_legal_name)::text = 'THOMPSON'::text)
-> Bitmap Index Scan on npis_temp_provider_last_name_legal_name_idx1 (cost=0.00..117.65 rows=5988 width=0) (actual time=36.819..36.819 rows=5423 loops=1)
Index Cond: ((provider_last_name_legal_name)::text = 'THOMPSON'::text)
Total runtime: 578.301 ms
(6 rows)
bloomapi=# explain analyze SELECT * FROM npis WHERE provider_last_name_legal_name = 'THOMPSON' offset 100 limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=395.81..435.40 rows=10 width=2589) (actual time=0.397..0.440 rows=10 loops=1)
-> Index Scan using npis_temp_provider_last_name_legal_name_idx1 on npis (cost=0.00..23701.38 rows=5988 width=2589) (actual time=0.063..0.293 rows=110 loops=1)
Index Cond: ((provider_last_name_legal_name)::text = 'THOMPSON'::text)
Total runtime: 0.952 ms
(4 rows)
bloomapi=# explain analyze SELECT * FROM npis WHERE provider_last_name_legal_name = 'THOMPSON' offset 4100 limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=13993.25..14027.09 rows=10 width=2589) (actual time=9356.723..9400.021 rows=10 loops=1)
-> Bitmap Heap Scan on npis (cost=119.15..20382.11 rows=5988 width=2589) (actual time=2.968..9393.327 rows=4110 loops=1)
Recheck Cond: ((provider_last_name_legal_name)::text = 'THOMPSON'::text)
-> Bitmap Index Scan on npis_temp_provider_last_name_legal_name_idx1 (cost=0.00..117.65 rows=5988 width=0) (actual time=1.943..1.943 rows=5423 loops=1)
Index Cond: ((provider_last_name_legal_name)::text = 'THOMPSON'::text)
Total runtime: 9400.426 ms
(6 rows)
Some relevant notes:
- I cleared the shared memory on the system before running the first query so some of the actual time of the first query is probably impacted by index loading
- the data is wide and sparse -- 329 columns, many of which are empty character varying(30ish)
- the data is virtually read-only -- being updated with another 15k rows once a week.
- the perf of these queries was actually higher for the same queries when there was the default db settings shipped with the ubuntu ppa (I don't have these query plans at the moment but could dig into them if it nothing obvious jumps out otherwise). The parameters that have been changed from the defaults: shared_buffers = 256MB, effective_cache_size = 512MB, checkpoint_segments = 64, checkpoint_completion_target = 0.9, default_statistics_target = 500
- actual data about 4 million rows/ 1.29GB for the table by itself, provider_last_name_legal_name is btree indexed -- size of index is 95mb. about 3/4 of rows have a non-null value in this column, and whole table has 488k distinct values
random_page_costto a lower value (~ 1.5) ? BTW: what is your setting forwork_memPlus:effective_cache_size = 512Mseems to be rather low; your 1.3GB table should (almost) fit in core, at least the index.provider_last_name_legal_name? Finally: 1.2G size/4Mrows := 300 bytes byte/row, which seems a bit high. BTW: how do you fit 329 columns into 300 bytes?