0
SELECT count(e_id) AS count,
       e_id
FROM   test
WHERE  created_at BETWEEN '2021-12-01 00:00:00' AND '2021-12-08 00:00:00'
       AND std IN ( '1' )
       AND section IN ( 'Sample' )
GROUP  BY e_id
ORDER  BY count DESC
LIMIT  4 

The table has around 1 M records. The query execution is less than 40 ms but computation takes a hit at the group by and query cost high.

Limit  (cost=26133.76..26133.77 rows=4 width=45) (actual time=52.300..52.303 rows=3 loops=1)
  ->  Sort  (cost=26133.76..26134.77 rows=403 width=45) (actual time=52.299..52.301 rows=3 loops=1)
        Sort Key: (count(e_id)) DESC
        Sort Method: quicksort  Memory: 25kB
        ->  GroupAggregate  (cost=26120.66..26127.72 rows=403 width=45) (actual time=52.287..52.289 rows=3 loops=1)
              Group Key: e_id
              ->  Sort  (cost=26120.66..26121.67 rows=404 width=37) (actual time=52.281..52.283 rows=5 loops=1)
                    Sort Key: e_id
                    Sort Method: quicksort  Memory: 25kB
                    ->  Bitmap Heap Scan on test  (cost=239.19..26103.17 rows=404 width=37) (actual time=49.339..52.261 rows=5 loops=1)
                          Recheck Cond: ((section)::text = 'test'::text)
"                          Filter: ((created_at >= '2021-12-01 00:00:00'::timestamp without time zone) AND (created_at <= '2021-12-08 00:00:00'::timestamp without time zone) AND ((std)::text = ANY ('{1,2}'::text[])))"
                          Rows Removed by Filter: 38329
                          Heap Blocks: exact=33997
                          ->  Bitmap Index Scan on index_test_on_section  (cost=0.00..239.09 rows=7270 width=0) (actual time=6.815..6.815 rows=38334 loops=1)
                                Index Cond: ((section)::text = 'test'::text)

How can I optimize the group by and count, so that CPU does not shoot up?

2 Answers 2

1

The best index for this query is

CREATE INDEX ON test (section, created_at, std) INCLUDE (e_id);

Then VACUUM the table and try again.

Sign up to request clarification or add additional context in comments.

1 Comment

Creating an index on the table in PostGreSQL will not solve the speed problem... PostGreSQL as ever be known to be the slowest RDBMS due to a misconception of the storage engine. Only a materialized view that is synchronous to the table state will solve this problem... Have a look about bas performance on the paper I wrote : mssqlserver.fr/…
0

Unless you have shown us the wrong plan, the slow step is not the group by, but rather the Bitmap Heap Scan

Your index on "section" returns 38334, of which all but 5 are filtered out. We can't tell if they are filtered out mostly by the "std" criterion or the "created_at" one. You need a more specific multicolumn index. The one i think is most likely to be effective is on (section, std, created_at).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.