4

I'm running postgresql 9.4 on Centos 6.7. One of the tables contains lots of millions of records, this is the DDL:

CREATE TABLE domain.examples (
  id SERIAL,
  sentence VARCHAR,
  product_id BIGINT,
  site_id INTEGER,
  time_stamp BIGINT,
  category_id INTEGER,
  CONSTRAINT examples_pkey PRIMARY KEY(id)
) 
WITH (oids = false);

CREATE INDEX examples_categories ON domain.examples
  USING btree (category_id);

CREATE INDEX examples_site_idx ON domain.examples
  USING btree (site_id);

The application that consumes the data do that using pagination, so we're fetching bulks of 1000 records. However even when fetching by an indexed column, the fetch time is very slow:

explain analyze
select *
from domain.examples e
where e.category_id = 105154
order by id asc 
limit 1000;

Limit  (cost=0.57..331453.23 rows=1000 width=280) (actual time=2248261.276..2248296.600 rows=1000 loops=1)
  ->  Index Scan using examples_pkey on examples e  (cost=0.57..486638470.34 rows=1468199 width=280) (actual time=2248261.269..2248293.705 rows=1000 loops=1)
        Filter: (category_id = 105154)
        Rows Removed by Filter: 173306740
Planning time: 70.821 ms
Execution time: 2248328.457 ms

What's causing the slow query? And how that can be improved?

Thanks!

5
  • Are all those _id columns supposed to be foreign keys? They don't seem to be declared as such. How big is the stuff in sentence? It's possible your caches were cold, or the server's disk was overloaded. Try it again. Commented Feb 8, 2017 at 21:32
  • They are, should improve performance if will be declared as such? The fetch is only from that table, no join involved. 'sentence' are quite short strings, also querying again and again results with the same poor performance. Commented Feb 8, 2017 at 21:55
  • 1
    Do you have valid statistics? -->> VACUUM ANALYZE domain.examples; BTW is e.category_id a low-cardinality column? Commented Feb 8, 2017 at 22:09
  • @Seffy Yes, all foreign keys are automatically indexed. Performance aside, you should always declare your foreign keys so the database can enforce referential integrity, so the people and things reading your schema know how everything is connected, and so you can take advantage of those relationships to make sure you're not leaving orphan or to clean up relationships with things like cascade on delete. Commented Feb 8, 2017 at 22:14
  • 2
    BTW: for this particular query, a composite index could be beneficial: CREATE UNIQUE INDEX examples_categories ON domain.examples USING btree (category_id, id); Commented Feb 8, 2017 at 22:56

2 Answers 2

1

You can create index on both fields category_id and id:

CREATE INDEX examples_site_idx2 ON domain.examples
  USING btree (category_id, id);

I try explain analyze with your query with 3,000,000 rows.

With old index:

                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..9234.56 rows=1000 width=60) (actual time=0.655..597.193 rows=322 loops=1)
   ->  Index Scan using examples_pkey on examples e  (cost=0.43..138512.43 rows=15000 width=60) (actual time=0.654..597.142 rows=322 loops=1)
         Filter: (category_id = 105154)
         Rows Removed by Filter: 2999678
 Planning time: 2.295 ms
 Execution time: 597.257 ms
(6 rows)

With new index:

                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..2585.13 rows=1000 width=60) (actual time=0.027..28.814 rows=322 loops=1)
   ->  Index Scan using examples_site_idx2 on examples e  (cost=0.43..38770.93 rows=15000 width=60) (actual time=0.026..28.777 rows=322 loops=1)
         Index Cond: (category_id = 105154)
 Planning time: 1.471 ms
 Execution time: 28.860 ms
(5 rows)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! Works as a charm :-)
1

This is not the plan you want, postgresql is scanning the entire index examples_pkey and filtering out records with the condition category_id = 105154, you could try getting better statistics on the table with ANALYZE or playing with the systems GUCs (which I REALLY do not recommend) to get the planner to pick the right index.

Or, if the number of rows with category_id = 105154 isn't too high I would recommend using a CTE first so the planner is forced to use the examples_categories index;

with favorite_category as (
    select *
    from domain.examples e
    where e.category_id = 105154)
select *
from favorite_category
order by id asc
limit 1000;

This will fetch all records with category_id = 105154 and do a in memory sort by id (if the size of that fetch is less than your working memory, show work_mem; to see what that is. The default is 4MB).

3 Comments

For Postgres, a CTE is in general not a solution for a performance problem.
@wildplasser I agree! But in this case I think it makes sense if the number of rows with category_id = 105154 is small. Also, I'm curious, could you provide an example to support your claim? I'm not saying I disagree!
In that case, a query without a CTE could benefit even more. (a CTE is an barrier for the optmiser) But, in this case it would probably not make a difference.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.