Postgres query takes longer than expected, even w/ index on fields

Question

I'm working on optimizing a Postgres table that stores information from a log file.

Here is the query:

SELECT c_ip as ip
     , x_ctx as file_name
     , date_time
     , live
     , c_user_agent as user_agent 
FROM events 
WHERE x_event = 'play' 
  AND date = '2012-12-01' 
  AND username = 'testing'

There are b-tree indexes on x_event, date, and username. In this table, there are around 25 million rows. Right now the query takes about 20-25 (correction, more like 40) seconds, and returns 143,000 rows.

Is that time expected? I would have thought it would be faster because of the indexes. Perhaps because of the sheer amount of data it has to go thru?

EDIT: Here is the EXPLAIN ANALYZE:

Bitmap Heap Scan on events  (cost=251347.32..373829.74 rows=35190 width=56) (actual time=5768.409..6124.313 rows=143061 loops=1)
  Recheck Cond: ((date = '2012-12-01'::date) AND (username = 'testing'::text) AND (x_event = 'play'::text))
  ->  BitmapAnd  (cost=251347.32..251347.32 rows=35190 width=0) (actual time=5762.083..5762.083 rows=0 loops=1)
        ->  Bitmap Index Scan on index_events_fresh_date  (cost=0.00..10247.04 rows=554137 width=0) (actual time=57.568..57.568 rows=572221 loops=1)
              Index Cond: (date = '2012-12-01'::date)
        ->  Bitmap Index Scan on index_events_fresh_username  (cost=0.00..116960.55 rows=6328206 width=0) (actual time=3184.053..3184.053 rows=6245831 loops=1)
              Index Cond: (username = 'testing'::text)
        ->  Bitmap Index Scan on index_events_fresh_x_event  (cost=0.00..124112.84 rows=6328206 width=0) (actual time=2478.919..2478.919 rows=6245841 loops=1)
              Index Cond: (x_event = 'play'::text)
Total runtime: 6148.313 ms

I have several questions about that:

Am I correct that there are 554137 rows in the date index? There are less than 50 date's that should be in there.
How do I know what index it is using of the three listed?
The total runtime listed was around 6 seconds, but when I run the query w/o EXPLAIN ANALYZE, it takes around 40 seconds.

If you look at your explain analyze, it shows the query taking 6,000ms or so. My guess is that the query runs that quickly and the other 30 seconds or so are transfer time for your data to move from the pg server to your client. Try declaring this query in a cursor and I best it returns in 6 or so seconds and then retrieving rows takes most of the time. Just a guess tho. — Scott Marlowe
– Scott Marlowe, Commented Dec 22, 2012 at 5:07
If you look at the expected vs actual rows, the three subclauses all expect about 6M rows to be needed, all three are almost correct. The product (.25 ^3) 1/64 (350K ~= 25M/64) should be expected in the outer query, but the estimated value (35K) is about 1/10 of this, which implies that the statistiscs don't help in this case. The actual number of returned rows. (1.4M) is even larger, which completes the disaster. Setting statistic targets to a higher value could help the planner to choose another plan. YMMV. BTW: I am missing the estimates for rowsize and/or number of affected pages. — wildplasser
– wildplasser, Commented Dec 22, 2012 at 13:06
@ScottMarlowe it takes a lot of time for me also connecting to localhost. what can I do to increase the response? — Dejell
– Dejell, Commented Feb 22, 2017 at 12:11
Check to see if you've got reverse lookups on on the pgsql server. Those can take a bit of time and if the server can't do a reverse lookup it will then have to timeout before the connection is made. Make sure pg_hba.conf is working on JUST IPs not hostnames. — Scott Marlowe
– Scott Marlowe, Commented Mar 13, 2017 at 21:44

Eelke · Accepted Answer · 2012-12-22 09:01:20Z

1

First as Scott Marlowe says the query only takes 6s to run the rest is transfer time. It seems slower without explain analyze because the result is much larger then the ten lines of the explain analyze output and thus takes longer to transfer. If you would turn on logging of queries and you ran this query you would probably find in the log that the query without explain analyze runs even faster (explain analyze slows things down). BTW pgadmin is quite slow itself if that is what you are using.

As for the number of rows in the date index pg is right. Even if you only have 50 distinct values all rows will be in the index. Ofcourse the btree part itself will only contain the 50 distinct values but under each leaf value it will have a list of all rows for that value. There is of course the special case of an index with a where clause which would only contain the rows matching the where clause but I do not expect you are using that right?

It is using all indexes listed in the output of explain analyze. In this case it converts each index into a bitmap having bits sets for each row that matches the criteria for that index scan. These three bitmaps can then very quickly be combined to a bitmap containing the result of the combined criteria.

answered Dec 22, 2012 at 9:01

Eelke

22.3k5 gold badges57 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

A.H. Over a year ago

Regarding the 554137 number: The explanation of "50 distinct values" vs. "all rows within one bucket" is right. But the actual number 554137 is the number of rows PostgreSQL guesses it must fetch from the index. This is the number the query planner relies on to optimize the query plans. The next rows=572221 number the number of rows which where really fetched from the index. Since these two numbers are in the same ballpark the planner had guessed properly.

Clodoaldo Neto · Accepted Answer · 2012-12-22 11:11:46Z

1

If 5.7 seconds is not good enough you can try a multi column index:

create index index_name on events(user_name, date, x_event)

I placed user_name first as I guess it is the column with the highest cardinality.

edited Dec 22, 2012 at 11:11

answered Dec 22, 2012 at 11:03

Clodoaldo Neto

127k30 gold badges251 silver badges274 bronze badges

Collectives™ on Stack Overflow

Postgres query takes longer than expected, even w/ index on fields

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related