3

I have around 5 million rows in a postgres table. I'd like to know how many rows match start_time >= NOW(), but despite having an index on start_time the query is extremely slow (in the order of several hours).

EXPLAIN SELECT COUNT(*) FROM core_event WHERE start_time >= NOW();
 Aggregate  (cost=449217.81..449217.82 rows=1 width=0)
   ->  Index Scan using core_event_start_time on core_event  (cost=0.00..447750.83 rows=586791 width=0)
         Index Cond: (start_time >= now())

Here's the schema information for the table:

 id          | integer                  | not null default nextval('core_event_id_seq'::regclass)
 source      | character varying(100)   | not null
 external_id | character varying(100)   | 
 title       | character varying(250)   | not null
 location    | geometry                 | not null
 start_time  | timestamp with time zone | 
 stop_time   | timestamp with time zone | 
 thumb       | character varying(300)   | 
 image       | character varying(100)   | 
 image_thumb | character varying(100)   | 
 address     | character varying(300)   | 
 description | text                     | 
 venue_name  | character varying(100)   | 
 website     | character varying(300)   | 
 city_id     | integer                  | 
 category    | character varying(100)   | 
 phone       | character varying(50)    | 
 place_id    | integer                  | 
 image_url   | character varying(300)   | 
 event_type  | character varying(200)   | 
 hidden      | boolean                  | not null
 views       | integer                  | not null
 added       | timestamp with time zone | 

I have indexes on the following fields:

city_id
external_id (unique)
location
location_id
place_id
start_time

Is there any easy way for me to speed up the query (eg. a partial index), or am I going to have to resort to partitioning the data by date?

10
  • 2
    wiki.postgresql.org/wiki/Slow_Counting Commented Jan 12, 2012 at 17:09
  • Does that really apply? As I read the wiki-link, @StefanNch's link is relevant only when count is not restricted to a field used in an index. Commented Jan 12, 2012 at 17:16
  • Are your statistics up to date? Commented Jan 12, 2012 at 17:16
  • 1
    @StefanNch, I'm applying a where clause so that shouldn't be too relevant. Commented Jan 12, 2012 at 17:31
  • @AndersUP I think they are up to date. I'm running ANALYZE core_event now, and I'll see if that has any impact. Commented Jan 12, 2012 at 17:32

2 Answers 2

4

Try adding a partial index like the following:

CREATE INDEX core_event_start_time_recent_idx ON core_event (start_time)
WHERE start_time >= '2011-01-12 0:0'::timestamptz

This will create a comparatively small index. Index creation will take some time, but queries like this one will be much faster thereafter.

SELECT count(*) FROM core_event WHERE start_time >= now();

The effectiveness of this index for queries against now() will degrade slowly over the course of time, depending on how many new rows are coming in. Update (= drop & create) the index with a more recent timestamp occasionally at off hours.
You could automate this with plpgsql function that you call per cronjob or pgAgent.


You might try and see if a running CLUSTER on the table improves things (if it doesn't go against other requirements in your db):

CLUSTER core_event USING core_event_start_time;

Yes, cluster on the full index, not the partial one. This will take a while and needs an exclusive lock, because it effectively rewrites the table. It also effectively vacuums the table fully. Read about it in the manual.

You also may want to increase the statistics target for core_event.start_time;

ALTER core_event ALTER start_time SET STATISTICS 1000; -- example value

The default is just 100. Then:

ANALYZE core_event;

Or course, all the usual performance stuff applies, too.

Sign up to request clarification or add additional context in comments.

6 Comments

I added the index, but from EXPLAIN ANALYZE it doesn't look like it's using it. It's doing a Bitmap Index Scan on core_event_start_time
@BenDowling: That's odd. I have a couple of indexes just like that and I verified it works for me (on version 8.4, 9.0 and 9.1). Should work for any timestamp after the one in the where clause of the index - provided most rows have a start_time older than that timestamp (or the index won't be any faster than a sequential table scan. In any case it would be chosen over an full index on start_time over all rows. Something does not add up here. Are you showing the full picture? How many rows (approx.) are before and after now() in your table?
I'm using 8.4, and have around 4 million rows total, and 700,000 >= NOW(). I've just run ANALYZE core_event, and re-run explain and it's still not using the partial index.
@BenDowling: With that many rows > now(), I can imagine the query planner choosing a sequential scan, because most of the data pages have to be read anyway. But the full index core_event_start_time should never be chosen over the partial index core_event_start_time_recent_idx when a matching condition is given.
It's clustered around start_time, could that be causing it to use that index instead?
|
0

Do most of these columns get populated for each row? If so, the amount of disk that postgresql has to look at to test rows for liveness even after checking the index will be fairly large. Try for example creating a separate table that only has id and start_time:

create table core_event_start_time as select id, start_time from core_event;
alter table core_event_start_time add primary key(id);
alter table core_event_start_time add foreign key(id) references core_event(id);
create index on core_event_start_time(start_time);

Now see how long it takes to count IDs in core_event_start_time only. Of course, this approach will take up more buffer cache at the expense of space for your actual core_event table...

If it helps, you can add a trigger onto core_event to keep the auxiliary table updated.

(postgresql 9.2 will introduce "index only scans" which may help with this sort of situation, but that's for the future)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.