2

We are running PostgresSql 9.6.11 database on Amazon RDS. The execution time of one of the queries is 6633.645 ms. This seems very slow. What changes can I make to improve the execution time for this query.

The query is selecting 3 columns where the data matches 6 of the columns.

select
    platform,
    publisher_platform,
    adset_id
FROM "adsets"
WHERE
    (("adsets"."account_id" IN ('1595321963838425', '1320001405', 'urn:li:sponsoredAccount:507697540')) AND
    ("adsets"."date" >= '2019-05-06 00:00:00.000000+0000') AND ("adsets"."date" <= '2019-05-13 23:59:59.999999+0000'))
GROUP BY
    "adsets"."platform",
    "adsets"."publisher_platform",
    "adsets"."adset_id"
ORDER BY
    "adsets"."platform",
    "adsets"."publisher_platform",
    "adsets"."adset_id";

The query is based on a table called adset table. The table has the following columns

account_id | text
campaign_id | text
adset_id | text
name | text
date | timestamp without time zone publisher_platform | text

and 15 other columns which are a mix of integers and text fields.

We have added the following indexes -

  1. "adsets_composite_unique_key" UNIQUE CONSTRAINT, btree (platform, account_id, campaign_id, adset_id, date, publisher_platform)
  2. "adsets_account_id_date_idx" btree (account_id DESC, date DESC) CLUSTER
  3. "adsets_account_id_index" btree (account_id)
  4. "adsets_adset_id_index" btree (adset_id)
  5. "adsets_campaign_id_index" btree (campaign_id)
  6. "adsets_name_index" btree (name)
  7. "adsets_platform_platform_id_publisher_platform" btree (account_id, platform, publisher_platform, adset_id)
  8. "idx_account_date_adsets" btree (account_id, date)
  9. "platform_pub_index" btree (platform, publisher_platform, adset_id).

The work_mem on postgres has been set to 125MB

Explain (analyse) shows

   Group  (cost=33447.55..33532.22 rows=8437 width=29) (actual time=6625.170..6633.062 rows=2807 loops=1)
   Group Key: platform, publisher_platform, adset_id
   ->  Sort  (cost=33447.55..33468.72 rows=8467 width=29) (actual time=6625.168..6629.271 rows=22331 loops=1)
         Sort Key: platform, publisher_platform, adset_id
         Sort Method: quicksort  Memory: 2513kB
         ->  Bitmap Heap Scan on adsets  (cost=433.63..32895.18 rows=8467 width=29) (actual time=40.003..6471.898 rows=22331 loops=1)
               Recheck Cond: ((account_id = ANY ('{1595321963838425,1320001405,urn:li:sponsoredAccount:507697540}'::text[])) AND (date >= '2019-05-06 00:00:00'::timestamp without time zone) AND (date <= '
2019-05-13 23:59:59.999999'::timestamp without time zone))
               Heap Blocks: exact=52907
               ->  Bitmap Index Scan on idx_account_date_adsets  (cost=0.00..431.51 rows=8467 width=0) (actual time=27.335..27.335 rows=75102 loops=1)
                     Index Cond: ((account_id = ANY ('{1595321963838425,1320001405,urn:li:sponsoredAccount:507697540}'::text[])) AND (date >= '2019-05-06 00:00:00'::timestamp without time zone) AND (date
<= '2019-05-13 23:59:59.999999'::timestamp without time zone))
 Planning time: 5.380 ms
 Execution time: 6633.645 ms
(12 rows)

Explain depesz

2 Answers 2

1

First of all, you are using GROUP BY without actually selecting any aggregates. You might as well just do SELECT DISTINCT in your query. This aside, here is the B tree index which you probably should be using:

CREATE INDEX idx ON adsets (account_id, date, platform, publisher_platform,
    adset_id);

The problem with your current index is that, while it does cover the columns you are selecting, it does not involve the columns which appear in the WHERE clause. This means that Postgres might choose to not even use the index, and rather just scan the entire table.

Note that my suggestion still does nothing to deal with the select distinct portion of the query, but at least it might speed up everything which comes before that part of the query.

Here is your updated query:

SELECT DISTINCT
    platform,
    publisher_platform,
    adset_id
FROM adsets
WHERE
    account_id IN ('1595321963838425', '1320001405',
                   'urn:li:sponsoredAccount:507697540') AND
    date >= '2019-05-06' AND date < '2019-05-14';
Sign up to request clarification or add additional context in comments.

2 Comments

This index is only good if you target an index only scan, right? I think it doesn't matter much whether you use DISTINCT or GROUP BY. You forgot the ORDER BY in your answer.
Thank Tim. The new index has improved the execution time. Looking at the explain output - I can see the new index being used . I am going to be selecting by aggregation. The problem I am also seeing is - The execution time is slower - when I run the query the first time but faster the 2nd time. I suspect the result is cached and that is why it is faster the 2nd time. s there anything I can do to improve the execution time when the query is run the first time?
0

Your problem are the many “false positives” that are found during the bitmap index scan phase and removed during the heap scan phase. Since there is no additional filter, I guess that the extra rows must be removed because they are not visible.

See if a VACUUM adsets will improve the query performance.

1 Comment

I missed the index from my list - have updated the list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.