Slow execution time for a postgres query with multiple column index

Question

We are running PostgresSql 9.6.11 database on Amazon RDS. The execution time of one of the queries is 6633.645 ms. This seems very slow. What changes can I make to improve the execution time for this query.

The query is selecting 3 columns where the data matches 6 of the columns.

select
    platform,
    publisher_platform,
    adset_id
FROM "adsets"
WHERE
    (("adsets"."account_id" IN ('1595321963838425', '1320001405', 'urn:li:sponsoredAccount:507697540')) AND
    ("adsets"."date" >= '2019-05-06 00:00:00.000000+0000') AND ("adsets"."date" <= '2019-05-13 23:59:59.999999+0000'))
GROUP BY
    "adsets"."platform",
    "adsets"."publisher_platform",
    "adsets"."adset_id"
ORDER BY
    "adsets"."platform",
    "adsets"."publisher_platform",
    "adsets"."adset_id";

The query is based on a table called adset table. The table has the following columns

and 15 other columns which are a mix of integers and text fields.

We have added the following indexes -

"adsets_composite_unique_key" UNIQUE CONSTRAINT, btree (platform, account_id, campaign_id, adset_id, date, publisher_platform)
"adsets_account_id_date_idx" btree (account_id DESC, date DESC) CLUSTER
"adsets_account_id_index" btree (account_id)
"adsets_adset_id_index" btree (adset_id)
"adsets_campaign_id_index" btree (campaign_id)
"adsets_name_index" btree (name)
"adsets_platform_platform_id_publisher_platform" btree (account_id, platform, publisher_platform, adset_id)
"idx_account_date_adsets" btree (account_id, date)
"platform_pub_index" btree (platform, publisher_platform, adset_id).

The work_mem on postgres has been set to 125MB

Explain (analyse) shows

   Group  (cost=33447.55..33532.22 rows=8437 width=29) (actual time=6625.170..6633.062 rows=2807 loops=1)
   Group Key: platform, publisher_platform, adset_id
   ->  Sort  (cost=33447.55..33468.72 rows=8467 width=29) (actual time=6625.168..6629.271 rows=22331 loops=1)
         Sort Key: platform, publisher_platform, adset_id
         Sort Method: quicksort  Memory: 2513kB
         ->  Bitmap Heap Scan on adsets  (cost=433.63..32895.18 rows=8467 width=29) (actual time=40.003..6471.898 rows=22331 loops=1)
               Recheck Cond: ((account_id = ANY ('{1595321963838425,1320001405,urn:li:sponsoredAccount:507697540}'::text[])) AND (date >= '2019-05-06 00:00:00'::timestamp without time zone) AND (date <= '
2019-05-13 23:59:59.999999'::timestamp without time zone))
               Heap Blocks: exact=52907
               ->  Bitmap Index Scan on idx_account_date_adsets  (cost=0.00..431.51 rows=8467 width=0) (actual time=27.335..27.335 rows=75102 loops=1)
                     Index Cond: ((account_id = ANY ('{1595321963838425,1320001405,urn:li:sponsoredAccount:507697540}'::text[])) AND (date >= '2019-05-06 00:00:00'::timestamp without time zone) AND (date
<= '2019-05-13 23:59:59.999999'::timestamp without time zone))
 Planning time: 5.380 ms
 Execution time: 6633.645 ms
(12 rows)

Explain depesz

Tim Biegeleisen · Accepted Answer · 2019-05-13 04:33:07Z

1

First of all, you are using GROUP BY without actually selecting any aggregates. You might as well just do SELECT DISTINCT in your query. This aside, here is the B tree index which you probably should be using:

CREATE INDEX idx ON adsets (account_id, date, platform, publisher_platform,
    adset_id);

The problem with your current index is that, while it does cover the columns you are selecting, it does not involve the columns which appear in the WHERE clause. This means that Postgres might choose to not even use the index, and rather just scan the entire table.

Note that my suggestion still does nothing to deal with the select distinct portion of the query, but at least it might speed up everything which comes before that part of the query.

Here is your updated query:

SELECT DISTINCT
    platform,
    publisher_platform,
    adset_id
FROM adsets
WHERE
    account_id IN ('1595321963838425', '1320001405',
                   'urn:li:sponsoredAccount:507697540') AND
    date >= '2019-05-06' AND date < '2019-05-14';

answered May 13, 2019 at 4:33

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Laurenz Albe Over a year ago

This index is only good if you target an index only scan, right? I think it doesn't matter much whether you use DISTINCT or GROUP BY. You forgot the ORDER BY in your answer.

sanq Over a year ago

Thank Tim. The new index has improved the execution time. Looking at the explain output - I can see the new index being used . I am going to be selecting by aggregation. The problem I am also seeing is - The execution time is slower - when I run the query the first time but faster the 2nd time. I suspect the result is cached and that is why it is faster the 2nd time. s there anything I can do to improve the execution time when the query is run the first time?

Laurenz Albe · Accepted Answer · 2019-05-13 13:58:30Z

0

Your problem are the many “false positives” that are found during the bitmap index scan phase and removed during the heap scan phase. Since there is no additional filter, I guess that the extra rows must be removed because they are not visible.

See if a VACUUM adsets will improve the query performance.

edited May 13, 2019 at 13:58

answered May 13, 2019 at 6:02

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

1 Comment

sanq Over a year ago

I missed the index from my list - have updated the list.

Collectives™ on Stack Overflow

Slow execution time for a postgres query with multiple column index

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related