Postgres index for aggregate query

Question

SELECT count(e_id) AS count,
       e_id
FROM   test
WHERE  created_at BETWEEN '2021-12-01 00:00:00' AND '2021-12-08 00:00:00'
       AND std IN ( '1' )
       AND section IN ( 'Sample' )
GROUP  BY e_id
ORDER  BY count DESC
LIMIT  4

The table has around 1 M records. The query execution is less than 40 ms but computation takes a hit at the group by and query cost high.

Limit  (cost=26133.76..26133.77 rows=4 width=45) (actual time=52.300..52.303 rows=3 loops=1)
  ->  Sort  (cost=26133.76..26134.77 rows=403 width=45) (actual time=52.299..52.301 rows=3 loops=1)
        Sort Key: (count(e_id)) DESC
        Sort Method: quicksort  Memory: 25kB
        ->  GroupAggregate  (cost=26120.66..26127.72 rows=403 width=45) (actual time=52.287..52.289 rows=3 loops=1)
              Group Key: e_id
              ->  Sort  (cost=26120.66..26121.67 rows=404 width=37) (actual time=52.281..52.283 rows=5 loops=1)
                    Sort Key: e_id
                    Sort Method: quicksort  Memory: 25kB
                    ->  Bitmap Heap Scan on test  (cost=239.19..26103.17 rows=404 width=37) (actual time=49.339..52.261 rows=5 loops=1)
                          Recheck Cond: ((section)::text = 'test'::text)
"                          Filter: ((created_at >= '2021-12-01 00:00:00'::timestamp without time zone) AND (created_at <= '2021-12-08 00:00:00'::timestamp without time zone) AND ((std)::text = ANY ('{1,2}'::text[])))"
                          Rows Removed by Filter: 38329
                          Heap Blocks: exact=33997
                          ->  Bitmap Index Scan on index_test_on_section  (cost=0.00..239.09 rows=7270 width=0) (actual time=6.815..6.815 rows=38334 loops=1)
                                Index Cond: ((section)::text = 'test'::text)

How can I optimize the group by and count, so that CPU does not shoot up?

Laurenz Albe · Accepted Answer · 2021-12-09 07:08:48Z

1

The best index for this query is

CREATE INDEX ON test (section, created_at, std) INCLUDE (e_id);

Then VACUUM the table and try again.

answered Dec 9, 2021 at 7:08

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

SQLpro Over a year ago

Creating an index on the table in PostGreSQL will not solve the speed problem... PostGreSQL as ever be known to be the slowest RDBMS due to a misconception of the storage engine. Only a materialized view that is synchronous to the table state will solve this problem... Have a look about bas performance on the paper I wrote : mssqlserver.fr/…

jjanes · Accepted Answer · 2021-12-09 18:40:06Z

0

Unless you have shown us the wrong plan, the slow step is not the group by, but rather the Bitmap Heap Scan

Your index on "section" returns 38334, of which all but 5 are filtered out. We can't tell if they are filtered out mostly by the "std" criterion or the "created_at" one. You need a more specific multicolumn index. The one i think is most likely to be effective is on (section, std, created_at).

answered Dec 9, 2021 at 18:40

jjanes

44.9k5 gold badges39 silver badges48 bronze badges

Collectives™ on Stack Overflow

Postgres index for aggregate query

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related