I'm working with a relatively large Postgres db containing several terabytes of data. It consists of data from 5 years back. As a result all the indexes are growing and therefore reducing the query performance as well as consuming lots of storage. Current index definition (original_index) is follows:
CREATE INDEX original_index ON table1
USING btree (key10 DESC)
I'm trying to restrict the index to only 1 year back from now and index definition (constrained_index) for that is as follows: (I removed the original_index before creating constrained_index)
CREATE INDEX constrained_index ON table1
USING btree (key10 DESC)
WHERE (key10 >= '2023-06-01 00:00:00+00'::timestamp with time zone)
I was using a specific query for both the original_index and constrained_index to see before and after execution time.
When I have only original_index, query used it for the query planner. But when I have only constrained_index, query planner does not use constrained_index. Instead it used a sequential scan on the table.
After creating constrained_index I tried running vacuum verbose analyze table1; to update the stats of the db.
It's highly appreciated if someone can explain why it is not selecting index2.
Postgres Version: 12.5
Table definitions:
CREATE TABLE table1 (
key1 integer,
key2 character varying(128),
key3 character varying(64),
key4 character varying(128),
key5 character varying(10),
key6 timestamp with time zone,
key7 timestamp with time zone,
key8 integer NOT NULL GENERATED BY DEFAULT AS IDENTITY
key9 boolean NOT NULL,
key10 timestamp with time zone
);
CREATE TABLE table2 (
key2 integer NOT NULL GENERATED BY DEFAULT AS IDENTITY
key3 timestamp with time zone NOT NULL,
key1 integer
);
CREATE TABLE table3 (
key1 character varying(128) NOT NULL,
key2 text
);
CREATE TABLE table4 (
key1 character varying(64) NOT NULL,
key2 text
);
Query:
SELECT table1.key1 AS a,
table1.key2 AS b,
table1.key3 AS c,
table1.key4 AS d,
table2.key1 AS e,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 = 'A') AS f,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 = 'B') AS g,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 = 'C') AS h,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 = 'D') AS i,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 IS NULL) AS j,
SUM((table1.key6 - table1.key7)) AS k,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key9) AS l,
COUNT(DISTINCT table1.key8) AS m
FROM table1
LEFT OUTER JOIN table3 ON (table1.key4 = table3.key1)
LEFT OUTER JOIN table2 ON (table1.key1 = table2.key2)
LEFT OUTER JOIN table4 ON (table1.key3 = table4.key1)
WHERE table1.key10 BETWEEN '2023-10-31 23:00:00+00:00' AND '2023-11-02 23:00:00+00:00'
GROUP BY 1,
2,
3,
4,
5,
table2.key3,
table4.key2,
table3.key2
ORDER BY table2.key3 DESC,
table1.key2 ASC,
table4.key2 DESC,
table3.key2 DESC,
table2.key1 ASC
LIMIT 20
Output of EXPLAIN ANALYZE:
Limit (cost=18341692.59..18341693.89 rows=20 width=159) (actual time=32149.681..32192.245 rows=20 loops=1)
-> GroupAggregate (cost=18341692.59..18353658.57 rows=184092 width=159) (actual time=31749.816..31792.379 rows=20 loops=1)
Group Key: table2.key3, table1.key2, table4.key2, table3.key2, table2.key1, table1.key1, table1.key3, table1.key4
-> Sort (cost=18341692.59..18342152.82 rows=184092 width=115) (actual time=31749.763..31792.253 rows=80 loops=1)
Sort Key: table2.key3 DESC, table1.key2, table4.key2 DESC, table3.key2 DESC, table2.key1, table1.key1, table1.key3, table1.key4
Sort Method: external merge Disk: 23424kB
-> Gather (cost=83595.20..18314267.69 rows=184092 width=115) (actual time=31417.800..31681.660 rows=186540 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Left Join (cost=82595.20..18294858.49 rows=76705 width=115) (actual time=31392.569..31545.627 rows=62180 loops=3)
Hash Cond: ((table1.key4)::text = (table3.key1)::text)
-> Hash Left Join (cost=81113.79..18293175.70 rows=76705 width=90) (actual time=31379.926..31524.528 rows=62180 loops=3)
Hash Cond: ((table1.key3)::text = (table4.key1)::text)
-> Parallel Hash Left Join (cost=78481.53..18288053.08 rows=76705 width=72) (actual time=31361.889..31481.346 rows=62180 loops=3)
Hash Cond: (table1.key1 = table2.key2)
-> Parallel Seq Scan on table1 (cost=0.00..18200395.20 rows=76705 width=60) (actual time=2.888..30824.027 rows=62180 loops=3)
Filter: ((key10 >= '2023-10-31 23:00:00+00'::timestamp with time zone) AND (key10 "<= '2023-11-02 23:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 43882417
-> Parallel Hash (cost=52397.57..52397.57 rows=1500557 width=16) (actual time=497.607..497.608 rows=1200445 loops=3)
Buckets: 131072 Batches: 64 Memory Usage: 3712kB
-> Parallel Seq Scan on table2 (cost=0.00..52397.57 rows=1500557 width=16) (actual time=225.666..349.851 rows=1200445 loops=3)
-> Hash (cost=1244.45..1244.45 rows=71745 width=25) (actual time=17.720..17.721 rows=72031 loops=3)
Buckets: 65536 Batches: 2 Memory Usage: 2568kB
-> Seq Scan on table4 (cost=0.00..1244.45 rows=71745 width=25) (actual time=0.019..6.313 rows=72031 loops=3)
-> Hash (cost=893.96..893.96 rows=46996 width=40) (actual time=12.367..12.368 rows=47030 loops=3)
Buckets: 65536 Batches: 1 Memory Usage: 3865kB
-> Seq Scan on table3 (cost=0.00..893.96 rows=46996 width=40) (actual time=0.017..4.731 rows=47030 loops=3)
Planning Time: 2.249 ms
JIT:
Functions: 90
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 8.216 ms, Inlining 84.266 ms, Optimization 596.840 ms, Emission 394.586 ms, Total 1083.909 ms
Execution Time: 32199.853 ms
d need to seeEXPLAIN (ANALYZE, BUFFERS, SETTINGS)` output for the queries. Also, you shouldn't make a secret of your PostgreSQL version.<key1>::text = (<key2>)::textis also interesting, it looks like you do some casting to text. That could destroy the usage of indexes, since it looks like that at least <key1> is a timestamp. Not TEXT. Next time share some information, like the DDL for the tables and indexes, the query itself and a compleet query plan that matches the DDL. We can now only guess what is going on.WHERE (<key11> >= ...), which obviously does not match the condition on key10. Typo or not? And is the main table named "document" or "table1". Please get your question straight. And cut the noise."<key1>"has no benefit over justkey1etc.