I have 3 tables:
create table cart (
id bigserial primary key,
buyer_id bigint unique not null
);
create table contact_person (
id bigserial primary key,
cart_id bigint references cart (id) not null unique,
phone_number jsonb,
first_name VARCHAR,
middle_name VARCHAR,
last_name VARCHAR
);
create table cart_items (
id bigserial primary key,
item_id bigint not null,
cart_id bigint references cart (id) not null,
count int not null,
unique (item_id, cart_id)
);
cart:contact_person related as 1:1
cart:cart_items 1:N
And i want to aggregate all cart_items fields by cart id. There are 2 options:
1) Aggregate before join:
select c.id as id,
c.buyer_id as buyer_id,
cp.id as contact_id,
cp.phone_number,
cp.first_name,
cp.middle_name,
cp.last_name,
ci.ids, ci.item_ids, ci.counts
from cart c
inner join contact_person cp on c.id = cp.cart_id
left join (select cart_id, array_agg(id) as ids, array_agg(item_id) as item_ids, array_agg(count) as counts
from cart_items ci
group by cart_id) ci on ci.cart_id = c.id
where c.buyer_id = :buyerId;
2) aggregate after join:
select c.id as id,
c.buyer_id as buyer_id,
cp.id as contact_id,
cp.phone_number,
cp.first_name,
cp.middle_name,
cp.last_name,
array_agg(ci.id) as ids,
array_agg(ci.item_id) as item_ids,
array_agg(ci.count) as counts
from cart c
inner join contact_person cp on c.id = cp.cart_id
left join cart_items ci on ci.cart_id = c.id
where c.buyer_id = :buyerId
group by c.id, cp.id;
And as Explain shows, the query with aggregation after join much faster. The query plans are really different, but I can not explain why in the case of aggregation before they have such a high cost.
1) aggregate before:
Nested Loop (cost=108.97..141.16 rows=1 width=248)
-> Merge Left Join (cost=108.82..132.96 rows=1 width=112)
Merge Cond: (c.id = ci.cart_id)
-> Sort (cost=8.18..8.19 rows=1 width=16)
Sort Key: c.id
-> Index Scan using cart_buyer_id_key on cart c (cost=0.15..8.17 rows=1 width=16)
Index Cond: (buyer_id = 1)
-> GroupAggregate (cost=100.64..122.26 rows=200 width=104)
Group Key: ci.cart_id
-> Sort (cost=100.64..104.26 rows=1450 width=28)
Sort Key: ci.cart_id
-> Seq Scan on cart_items ci (cost=0.00..24.50 rows=1450 width=28)
-> Index Scan using contact_person_cart_id_key on contact_person cp (cost=0.15..8.17 rows=1 width=144)
Index Cond: (cart_id = c.id)
2) aggregate after:
GroupAggregate (cost=41.62..41.66 rows=1 width=248)
Group Key: c.id, cp.id
-> Sort (cost=41.62..41.63 rows=1 width=172)
Sort Key: c.id, cp.id
-> Nested Loop Left Join (cost=15.33..41.61 rows=1 width=172)
-> Nested Loop (cost=0.30..16.37 rows=1 width=152)
-> Index Scan using cart_buyer_id_key on cart c (cost=0.15..8.17 rows=1 width=16)
Index Cond: (buyer_id = 1)
-> Index Scan using contact_person_cart_id_key on contact_person cp (cost=0.15..8.17 rows=1 width=144)
Index Cond: (cart_id = c.id)
-> Bitmap Heap Scan on cart_items ci (cost=15.03..25.17 rows=7 width=28)
Recheck Cond: (cart_id = c.id)
-> Bitmap Index Scan on cart_items_item_id_cart_id_key (cost=0.00..15.03 rows=7 width=0)
Index Cond: (cart_id = c.id)
I thought of adding an index on cart_id field to cart_items, this effectively accelerated the queries, but that in the first case, as in the second. How can you explain this difference?
cart_items.cart_id-->carts.id(this probably causes the need for a sort step) Note: the queries are both relatively small, cost-based planning does not work well for small numbers.