I have a recursive query that takes too long - 30+ ms where doing the individual queries to extract the same data manually takes < 0.12 ms. So we're talking 250x as long.
I have the following database structure allowing a DAG of group membership (db-fiddle here):
create table subjects
(
subject_id bigint not null
constraint pk_subjects
primary key
);
create table subject_group_members
(
subject_group_id bigint not null
constraint fk_subject_group_members_subject_group_id_subjects_subject_id
references subjects(subject_id)
on delete cascade,
subject_id bigint not null
constraint fk_subject_group_members_subject_id_subjects_subject_id
references subjects(subject_id)
on delete cascade,
constraint pk_subject_group_members
primary key (subject_group_id, subject_id)
);
create index idx_subject_group_members_subject_id
on subject_group_members (subject_id);
create index idx_subject_group_members_subject_group_id
on subject_group_members (subject_group_id);
Data might look like this:
| subject_group_id | subject_id |
|---|---|
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 5 |
| 3 | 5 |
I want to know all the groups that 5 is a member of (1 by inheritance, 2 & 3 directly, not 4 or any other subject ids).
This query works as expected:
with recursive flat_members(subject_group_id, subject_id) as (
select subject_group_id, subject_id
from subject_group_members gm
union
select
flat_members.subject_group_id as subject_group_id,
subject_group_members.subject_id as subject_id
from subject_group_members
join flat_members on flat_members.subject_id = subject_group_members.subject_group_id
)
select * from flat_members where subject_id = 5
But run with real data I get this query plan:
CTE Scan on flat_members (cost=36759729.47..59962757.76 rows=5156229 width=16) (actual time=26.526..55.166 rows=3 loops=1)
Filter: (subject_id = 30459)
Rows Removed by Filter: 48984
CTE flat_members
-> Recursive Union (cost=0.00..36759729.47 rows=1031245702 width=16) (actual time=0.022..47.638 rows=48987 loops=1)
-> Seq Scan on subject_group_members gm (cost=0.00..745.82 rows=48382 width=16) (actual time=0.019..4.286 rows=48382 loops=1)
-> Merge Join (cost=63629.74..1613406.96 rows=103119732 width=16) (actual time=10.897..11.038 rows=320 loops=2)
Merge Cond: (subject_group_members.subject_group_id = flat_members_1.subject_id)
-> Index Scan using idx_subject_group_members_subject_group_id on subject_group_members (cost=0.29..1651.02 rows=48382 width=16) (actual time=0.009..1.987 rows=24192 loops=2)
-> Materialize (cost=63629.45..66048.55 rows=483820 width=16) (actual time=4.124..6.592 rows=24668 loops=2)
-> Sort (cost=63629.45..64839.00 rows=483820 width=16) (actual time=4.120..5.034 rows=24494 loops=2)
Sort Key: flat_members_1.subject_id
Sort Method: quicksort Memory: 53kB
-> WorkTable Scan on flat_members flat_members_1 (cost=0.00..9676.40 rows=483820 width=16) (actual time=0.001..0.916 rows=24494 loops=2)
Planning Time: 0.296 ms
Execution Time: 56.735 ms
Now if I do it manually, querying select subject_group_id from subject_group_members where subject_id = 30459 and following the tree up, it's 4 queries each taking about 0.02ms.
Is there a way where I can make the recursive query approach the speed of doing the recursion manually?
subject_idI actually care about. Instead of finding the two rows that actually contain my desiredsubject_id, then using theirsubject_group_idas thesubject_idin subsequent recursions.