1

I have the following partitioned table

    Column     |            Type             |       Modifiers        | Storage | Stats target | Description 
---------------+-----------------------------+------------------------+---------+--------------+-------------
 time          | timestamp without time zone | not null               | plain   |              | 
 connection_id | integer                     | not null               | plain   |              | 
 is_authorized | boolean                     | not null default false | plain   |              | 
 is_active     | boolean                     | not null default true  | plain   |              | 
Indexes:
    "active_connection_time_idx" btree ("time")
Child tables: metrics.active_connection_2022_02_26t00,
              metrics.active_connection_2022_02_27t00,
              metrics.active_connection_2022_02_28t00,
              metrics.active_connection_2022_03_01t00,
              metrics.active_connection_2022_03_02t00,
              metrics.active_connection_2022_04_21t00

All partitions have indexes for time column. I need execute the following query

SELECT c.connection_id, (array_agg(is_authorized order by time desc))[1], bool_or(is_active) FROM metrics.active_connection c WHERE c.time BETWEEN '2022-01-26 00:00:00' AND '2022-04-15 23:59:59' GROUP BY c.connection_id;

And I get the plan (quick seq scan and low external sort):

                                                                           QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=1878772.55..1999873.62 rows=200 width=6) (actual time=11516.621..22951.961 rows=30631 loops=1)
   Group Key: c.connection_id
   ->  Sort  (cost=1878772.55..1909047.19 rows=12109857 width=14) (actual time=11388.096..15601.938 rows=12109856 loops=1)
         Sort Key: c.connection_id
         Sort Method: external merge  Disk: 319520kB
         ->  Append  (cost=0.00..247108.84 rows=12109857 width=14) (actual time=0.022..5346.587 rows=12109856 loops=1)
               ->  Seq Scan on active_connection c  (cost=0.00..0.00 rows=1 width=14) (actual time=0.004..0.004 rows=0 loops=1)
                     Filter: (("time" >= '2022-01-26 00:00:00'::timestamp without time zone) AND ("time" <= '2022-04-15 23:59:59'::timestamp without time zone))
               ->  Seq Scan on active_connection_2022_02_26t00 c_1  (cost=0.00..21728.74 rows=1064849 width=14) (actual time=0.017..307.754 rows=1064849 loops=1)
                     Filter: (("time" >= '2022-01-26 00:00:00'::timestamp without time zone) AND ("time" <= '2022-04-15 23:59:59'::timestamp without time zone))
              ......
               ->  Seq Scan on active_connection_2022_03_02t00 c_5  (cost=0.00..20964.04 rows=1027336 width=14) (actual time=0.018..268.314 rows=1027336 loops=1)
                     Filter: (("time" >= '2022-01-26 00:00:00'::timestamp without time zone) AND ("time" <= '2022-04-15 23:59:59'::timestamp without time zone))

If I add index for the connection_id column I get another plan (slow index scan and quick in-memory sort)


                                                                                                   QUERY PLAN                                                                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=2.23..1071044.89 rows=200 width=6) (actual time=203.337..49643.802 rows=30631 loops=1)
   Group Key: c.connection_id
   ->  Merge Append  (cost=2.23..980218.46 rows=12109857 width=14) (actual time=184.137..38926.435 rows=12109856 loops=1)
         Sort Key: c.connection_id
         ->  Sort  (cost=0.01..0.02 rows=1 width=14) (actual time=0.036..0.037 rows=0 loops=1)
               Sort Key: c.connection_id
               Sort Method: quicksort  Memory: 25kB
               ->  Seq Scan on active_connection c  (cost=0.00..0.00 rows=1 width=14) (actual time=0.004..0.004 rows=0 loops=1)
                     Filter: (("time" >= '2022-01-26 00:00:00'::timestamp without time zone) AND ("time" <= '2022-04-15 23:59:59'::timestamp without time zone))
         ->  Index Scan using active_connection_2022_02_26t00_conn_id on active_connection_2022_02_26t00 c_1  (cost=0.43..56013.08 rows=1064849 width=14) (actual time=6.386..1729.893 rows=1064849 loops=1)
               Filter: (("time" >= '2022-01-26 00:00:00'::timestamp without time zone) AND ("time" <= '2022-04-15 23:59:59'::timestamp without time zone))
         ....
         ->  Index Scan using active_connection_2022_03_02t00_conn_id on active_connection_2022_03_02t00 c_5  (cost=0.42..54039.14 rows=1027336 width=14) (actual time=0.062..2142.939 rows=1027336 loops=1)
               Filter: (("time" >= '2022-01-26 00:00:00'::timestamp without time zone) AND ("time" <= '2022-04-15 23:59:59'::timestamp without time zone))

Is it possible somehow get both quick sorting and quick seq scan?

0

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.