I'm trying to optimize a rather slow query running on AWS RDS Aurora PostgreSQL v16.1. The following query takes approx. 25 seconds:
SELECT t."Key", t."ActiveFrom", t."Content", t."Created", t."Postcode", t."HouseKeys", t."InactiveFrom", t."LastUpdated"
FROM "Tours" AS t
WHERE t."HouseKeys" && '{12345678,09876543,<<900 more unique keys>>}'
AND t."Postcode" = ANY ('{863482,248624,047628}')
AND t."ActiveFrom" <= '2024-09-30 09:20:23.503431+00' AND (t."InactiveFrom" IS NULL OR t."InactiveFrom" > '2024-09-30 09:20:23.503431+00')
This query is generated by Microsoft's ORM Entity Framework Core v8 that is used in the corresponding business application code. I captured this slow query by enabling the PostgreSQL feature auto_explain via the following DB config parameters:
shared_preload_libraries = auto_explain,pg_stat_statementsauto_explain.log_analyze = ONauto_explain.log_min_duration = 15000(in ms, i.e. 15 s)
This gave me the following log message:
2024-09-30 09:20:48 UTC:10.118.38.109(55654):MyUser@MyDatabase:[3823]:LOG: duration: 25309.354 ms plan:
Query Text: SELECT t."Key", t."ActiveFrom", t."Content", t."Created", t."Postcode", t."HouseKeys", t."InactiveFrom", t."LastUpdated"
FROM "Tours" AS t
WHERE t."HouseKeys" && $1 AND t."Postcode" = ANY ($2) AND t."ActiveFrom" <= $3 AND (t."InactiveFrom" IS NULL OR t."InactiveFrom" > $3)
Query Parameters: $1 = '{12345678,09876543,<<900 more unique keys>>}', $2 = '{863482,248624,047628}', $3 = '2024-09-30 09:20:23.503431+00'
Bitmap Heap Scan on "Tours" t (cost=21.19..1111.50 rows=807 width=848) (actual time=1365.628..24999.948 rows=8 loops=1)
Recheck Cond: (("Postcode")::text = ANY ('{863482,248624,047628}'::text[]))
Filter: (("HouseKeys" && '{12345678,09876543,<<900 more unique keys>>}'::text[]) AND ("ActiveFrom" <= '2024-09-30 09:20:23.503431+00'::timestamp with time zone) AND (("InactiveFrom" IS NULL) OR ("InactiveFrom" > '2024-09-30 09:20:23.503431+00'::timestamp with time zone)))
Rows Removed by Filter: 1077
Heap Blocks: exact=373
-> Bitmap Index Scan on "IX_Tours_Postcode" (cost=0.00..20.99 rows=1085 width=0) (actual time=0.068..0.069 rows=1085 loops=1)
Index Cond: (("Postcode")::text = ANY ('{863482,248624,047628}'::text[]))
After finding this answer, I ensured that JIT is disabled:
# SHOW jit
off
This is the DDL of the table which contains only 5000 rows:
CREATE TABLE IF NOT EXISTS public."Tours"
(
"Key" character varying(50) COLLATE pg_catalog."default" NOT NULL,
"Content" jsonb NOT NULL,
"LastUpdated" timestamp with time zone NOT NULL,
"Created" timestamp with time zone NOT NULL,
"HouseKeys" text[] COLLATE pg_catalog."default" NOT NULL,
"ActiveFrom" timestamp with time zone NOT NULL DEFAULT '-infinity'::timestamp with time zone,
"InactiveFrom" timestamp with time zone,
"Postcode" character varying(6) COLLATE pg_catalog."default" NOT NULL DEFAULT ''::text,
CONSTRAINT "PK_Tours" PRIMARY KEY ("Key", "ActiveFrom")
)
CREATE INDEX IF NOT EXISTS "IX_Tours_Postcode"
ON public."Tours" USING btree
("Postcode" COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS "IX_Tours_HouseKeys"
ON public."Tours" USING gin
("HouseKeys" COLLATE pg_catalog."default")
TABLESPACE pg_default;
Can anyone please help me with tuning this query? I wonder whether adding a combined index for all queried fields (HouseKeys,Postcode,ActiveFrom,InactiveFrom) would be a good idea. If so, which index type would be a good choice?
Update 01.10.2024 - I: after setting the DB config parameter auto_explain.log_buffers to ON, the log output looks like this:
2024-10-01 07:04:18 UTC:10.118.38.132(60402):MyUser@MyDatabase:[983]:LOG: duration: 47855.615 ms plan:
Query Text: SELECT t."Key", t."ActiveFrom", t."Postcode", t."HouseKeys", t."InactiveFrom", t."LastUpdated"
FROM "Tours" AS t
WHERE t."HouseKeys" && $1 AND t."Postcode" = ANY ($2) AND t."ActiveFrom" <= $3 AND (t."InactiveFrom" IS NULL OR t."InactiveFrom" > $3)
Query Parameters: $1 = '{12345678,09876543,<<900 more unique keys>>}', $2 = '{863482,248624,047628}', $3 = '2024-10-01 07:03:30.609959+00'
Bitmap Heap Scan on "Tours" t (cost=21.19..1111.50 rows=807 width=848) (actual time=47855.610..47855.612 rows=0 loops=1)
Recheck Cond: (("Postcode")::text = ANY ('{863482,248624,047628}'::text[]))
Filter: (("HouseKeys" && '{12345678,09876543,<<900 more unique keys>>}'::text[]) AND ("ActiveFrom" <= '2024-10-01 07:03:30.609959+00'::timestamp with time zone) AND (("InactiveFrom" IS NULL) OR ("InactiveFrom" > '2024-10-01 07:03:30.609959+00'::timestamp with time zone)))
Rows Removed by Filter: 1085
Heap Blocks: exact=373
Buffers: shared hit=1206
-> Bitmap Index Scan on "IX_Tours_Postcode" (cost=0.00..20.99 rows=1085 width=0) (actual time=0.067..0.067 rows=1085 loops=1)
Index Cond: (("Postcode")::text = ANY ('{863482,248624,047628}'::text[]))
Buffers: shared hit=6
Update 01.10.2024 - II:
Furthermore, I observed that running the query "in isolation" (i.e. there is no load on the database by other queries), the execution time goes down to 2 s. However, in situations when slow queries occur, 20 of these queries run in parallel.
So I made the following small test on my local machine:
- Run PostgreSQL in Docker
- Backup DB and restore it locally
- Run
docker statsfor the PostgreSQL container - Run the query
I noticed that the CPU usage went through the roof when using the && operator to check whether the arrays overlap. I also see a very high CPU usage in the AWS RDS Aurora PostgreSQL metrics during the execution time.
So I wonder whether I'm actually overloading the database by using the && operator and if there are any alternatives 🤔
auto_explain.log_buffers, do another test, and get back to you