How can sorting (before a merge join) increase the number of rows?

Question

I'm working on a query that's performing terribly:

SELECT COUNT(*)
FROM ps 
INNER JOIN p ON p.id = ps.patient_id 
INNER JOIN hh ON hh.id = ps.hh_id 
INNER JOIN cma ON cma.id = ps.cma_id 
INNER JOIN ter ters ON ( p.mm_id = ters.member_id ) 
    AND ( hh.mmis_id = ters.hh_mmis_id ) 
    AND ( cma.mmis_id = ters.cma_mmis_id ) 
    AND ( ps.start_date = ters.begin_date ) 
    AND ( CASE WHEN ps.oe_id = 1 THEN 'O' WHEN ps.oe_id = 2 THEN 'E' ELSE 'UNKNOWN_oe_id' END = ters.outreach_enrollment_code ) 
WHERE ters.status != 'Canceled' AND hh.id = 1;

and in the query plan I notice that a sort node (before a merge join) is emitting waaaay more rows than the node receives as input. This really confuses my mental model, what am I missing?

Here's the snippet of the query plan in question:

->  Sort  (cost=20956.81..21259.78 rows=121187 width=20) (actual time=140.260..3363.612 rows=29930138 loops=1)
    Output: ps.p_id, ps.hh_id, ps.cma_id, ps.start_date, ps.oe_code_id, (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_code_id'::text END)
    Sort Key: ps.start_date, ps.cma_id, (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_code_id'::text END)
    Sort Method: quicksort  Memory: 12708kB
    Buffers: shared hit=4983
    ->  Bitmap Heap Scan on public.ps  (cost=2275.62..10724.46 rows=121187 width=20) (actual time=8.833..58.231 rows=123338 loops=1)
          Output: ps.p_id, ps.hh_id, ps.cma_id, ps.start_date, ps.oe_code_id, CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_code_id'::text END
          Recheck Cond: (ps.hh_id = 1)
          Heap Blocks: exact=4644
          Buffers: shared hit=4983
          ->  Bitmap Index Scan on index_ps_on_hh_id  (cost=0.00..2245.33 rows=121187 width=0) (actual time=8.138..8.138 rows=123338 loops=1)
                Index Cond: (ps.hh_id = 1)
                Buffers: shared hit=339

Notice that the bitmap heap scan emits 123,338 rows, then the sort emits 29,930,138!

Folks have asked for the full query plan:

Aggregate  (cost=67207.10..67207.11 rows=1 width=0) (actual time=199297.658..199297.658 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=119969133 dirtied=1
  ->  Nested Loop  (cost=59884.61..67207.10 rows=1 width=0) (actual time=486.145..199261.336 rows=120386 loops=1)
        Join Filter: (ps.p_id = p.id)
        Rows Removed by Join Filter: 29809605
        Buffers: shared hit=119969133 dirtied=1
        ->  Merge Join  (cost=59884.19..62745.05 rows=8862 width=13) (actual time=486.052..19265.755 rows=29930082 loops=1)
              Output: ps.p_id, ters.member_id
              Merge Cond: ((ters.begin_date = ps.start_date) AND (cma.id = ps.cma_id) AND ((ters.oe_code)::text = (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_CODE_ID'::text END)))
              Buffers: shared hit=11752
              ->  Sort  (cost=38920.83..39082.15 rows=64528 width=23) (actual time=323.201..384.837 rows=130638 loops=1)
                    Output: hh.id, ters.member_id, ters.begin_date, ters.oe_code, cma.id
                    Sort Key: ters.begin_date, cma.id, ters.oe_code
                    Sort Method: quicksort  Memory: 13279kB
                    Buffers: shared hit=6769
                    ->  Hash Join  (cost=3194.35..33765.80 rows=64528 width=23) (actual time=18.149..194.187 rows=130638 loops=1)
                          Output: hh.id, ters.member_id, ters.begin_date, ters.oe_code, cma.id
                          Hash Cond: ((ters.cma_mmis_id)::text = (cma.mmis_id)::text)
                          Buffers: shared hit=6759
                          ->  Nested Loop  (cost=3190.12..32556.05 rows=64028 width=28) (actual time=18.075..150.186 rows=130108 loops=1)
                                Output: hh.id, ters.member_id, ters.cma_mmis_id, ters.begin_date, ters.oe_code
                                Buffers: shared hit=6754
                                ->  Seq Scan on public.hh  (cost=0.00..1.12 rows=1 width=10) (actual time=0.008..0.011 rows=1 loops=1)
                                      Output: hh.id, hh.name ... [redacted]
                                      Filter: (hh.id = 1)
                                      Rows Removed by Filter: 9
                                      Buffers: shared hit=1
                                ->  Bitmap Heap Scan on public.ters ters  (cost=3190.12..31678.69 rows=87623 width=33) (actual time=18.063..124.542 rows=130108 loops=1)
                                      Output: ters.member_id, ters.hh_mmis_id, ters.cma_mmis_id, ters.begin_date, ters.oe_code
                                      Recheck Cond: ((ters.hh_mmis_id)::text = (hh.mmis_id)::text)
                                      Filter: ((ters.status)::text <> 'Canceled'::text)
                                      Rows Removed by Filter: 49848
                                      Heap Blocks: exact=6060
                                      Buffers: shared hit=6753
                                      ->  Bitmap Index Scan on ters_hh_mmis_id_idx  (cost=0.00..3168.21 rows=138105 width=0) (actual time=16.965..16.965 rows=179956 loops=1)
                                            Index Cond: ((ters.hh_mmis_id)::text = (hh.mmis_id)::text)
                                            Buffers: shared hit=693
                          ->  Hash  (cost=2.99..2.99 rows=99 width=12) (actual time=0.052..0.052 rows=99 loops=1)
                                Output: cma.id, cma.mmis_id
                                Buckets: 1024  Batches: 1  Memory Usage: 5kB
                                Buffers: shared hit=2
                                ->  Seq Scan on public.cma  (cost=0.00..2.99 rows=99 width=12) (actual time=0.006..0.030 rows=99 loops=1)
                                      Output: cma.id, cma.mmis_id
                                      Buffers: shared hit=2
              ->  Sort  (cost=20956.81..21259.78 rows=121187 width=20) (actual time=162.834..3317.995 rows=29930138 loops=1)
                    Output: ps.p_id, ps.hh_id, ps.cma_id, ps.start_date, ps.oe_code_id, (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_CODE_ID'::text END)
                    Sort Key: ps.start_date, ps.cma_id, (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_CODE_ID'::text END)
                    Sort Method: quicksort  Memory: 12708kB
                    Buffers: shared hit=4983
                    ->  Bitmap Heap Scan on public.ps  (cost=2275.62..10724.46 rows=121187 width=20) (actual time=9.940..72.463 rows=123338 loops=1)
                          Output: ps.p_id, ps.hh_id, ps.cma_id, ps.start_date, ps.oe_code_id, CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_CODE_ID'::text END
                          Recheck Cond: (ps.hh_id = 1)
                          Heap Blocks: exact=4644
                          Buffers: shared hit=4983
                          ->  Bitmap Index Scan on index_ps_on_hh_id  (cost=0.00..2245.33 rows=121187 width=0) (actual time=9.226..9.226 rows=123338 loops=1)
                                Index Cond: (ps.hh_id = 1)
                                Buffers: shared hit=339
        ->  Index Scan using index_p_on_mm_id on public.p  (cost=0.42..0.49 rows=1 width=12) (actual time=0.005..0.006 rows=1 loops=29930082)
              Output: p.id, p.mm_id
              Index Cond: ((p.mm_id)::text = (ters.member_id)::text)
              Buffers: shared hit=119957381 dirtied=1
Planning time: 5.952 ms
Execution time: 199299.305 ms

SELECT COUNT(*) FROM ps INNER JOIN p ON p.id = ps.patient_id INNER JOIN hh ON hh.id = ps.hh_id INNER JOIN cma ON cma.id = ps.cma_id INNER JOIN ter ters ON ( p.mm_id = ters.member_id ) AND ( hh.mmis_id = ters.hh_mmis_id ) AND ( cma.mmis_id = ters.cma_mmis_id ) AND ( ps.start_date = ters.begin_date ) AND ( CASE WHEN ps.oe_id = 1 THEN 'O' WHEN ps.oe_id = 2 THEN 'E' ELSE 'UNKNOWN_oe_id' END = ters.outreach_enrollment_code ) WHERE ters.status != 'Canceled' AND hh.id = 1; That's the entire query. The plan is for 1 node + chldrn — mistidoi
– mistidoi, Commented Feb 1, 2018 at 19:31
this condition: "( CASE WHEN ps.oe_id = 1 THEN 'O' WHEN ps.oe_id = 2 THEN 'E' ELSE 'UNKNOWN_oe_id' END = ters.outreach_enrollment_code)" is the culprit. — Greg Viers
– Greg Viers, Commented Feb 1, 2018 at 19:52
Sounds like a bug. Which version is this? Can we have the complete query plan? — Laurenz Albe
– Laurenz Albe, Commented Feb 1, 2018 at 21:18
@mistidoi I found no bug like that in the release notes. It would be great if you could come up with a reproducible test case so that the bug can be fixed. Do you observe the same for the simplified query SELECT ps.p_id, ps.hh_id, ps.cma_id, ps.start_date, ps.oe_code_id, (CASE WHEN (ps.oe_code_id = 1) THEN 'O' WHEN (ps.oe_code_id = 2) THEN 'E' ELSE 'UNKNOWN_oe_code_id' END) FROM ps WHERE ps.hh_id = 1 ORDER BY 4, 3, 6;? — Laurenz Albe
– Laurenz Albe, Commented Feb 2, 2018 at 8:23

Greg Viers · Accepted Answer · 2018-02-02 13:25:14Z

0

Try refactoring it without the CASE statement in the ON clause

SELECT COUNT(*)
FROM ps 
INNER JOIN p ON p.id = ps.patient_id 
INNER JOIN hh ON hh.id = ps.hh_id 
INNER JOIN cma ON cma.id = ps.cma_id 
INNER JOIN ter ters ON ( p.mm_id = ters.member_id ) 
    AND ( hh.mmis_id = ters.hh_mmis_id ) 
    AND ( cma.mmis_id = ters.cma_mmis_id ) 
    AND ( ps.start_date = ters.begin_date ) 
    AND ( (ps.oe_id = 1 AND ters.outreach_enrollment_code = 'O')
        OR (ps.oe_id = 2 AND ters.outreach_enrollment_code = 'E')
        OR (ps.oe_id NOT IN (1,2) AND ters.outreach_enrollment_code = 'UNKNOWN_oe_id'))
WHERE ters.status != 'Canceled' AND hh.id = 1;

It will also help the performance if you make sure there are up to date statistics on these tables, and an index on ps.oe_id.

answered Feb 2, 2018 at 13:25

Greg Viers

3,5233 gold badges22 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mistidoi Over a year ago

This does help the query planner to put together a quicker plan (one that doesn't even include a merge join). I've found a couple other ways as well that end up making the query performant, but what I'm really trying to figure out is how on earth a sort node in a query plan could increase the number of rows emitted. Do you have any idea?

Greg Viers Over a year ago

I have found in practice that changing your query until the query planner comes up with something better is often a lot easier than actually finding out why it was bad in the first place. I find the query planner to be like an old 1980s TV. You have to walk up and smack it sometimes.

Collectives™ on Stack Overflow

How can sorting (before a merge join) increase the number of rows?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related