Avoid using Nested Loop Join while using a non Equi join condition

Question

Postgres is using a Nested Loop Join algorithm when I use a non equi join condition in my update query. I understand that the Nested Loop Join can be very costly as the right relation is scanned once for every row found in the left relation as per [https://www.postgresql.org/docs/8.3/planner-optimizer.html]

The update query and the execution plan is below.

Query

explain analyze
UPDATE target_tbl tgt
set descr     = stage.descr,
    prod_name = stage.prod_name,
    item_name = stage.item_name,
    url       = stage.url,
    col1_name = stage.col1_name,
    col2_name = stage.col2_name,
    col3_name = stage.col3_name,
    col4_name = stage.col4_name,
    col5_name = stage.col5_name,
    col6_name = stage.col6_name,
    col7_name = stage.col7_name,
    col8_name = stage.col8_name,
    flag      = stage.flag
from tbl1 stage
where tgt.col1 = stage.col1
  and tgt.col2 = stage.col2
  and coalesce(tgt.col3, 'col3'::text) = coalesce(stage.col3, 'col3'::text)
  and coalesce(tgt.col4, 'col4'::text) = coalesce(stage.col4, 'col4'::text)
  and stage.row_number::int >= 1::int
  and stage.row_number::int < 50001::int;

Execution Plan

Update on target_tbl tgt  (cost=0.56..3557.91 rows=1 width=813) (actual time=346153.460..346153.460 rows=0 loops=1)
  ->  Nested Loop  (cost=0.56..3557.91 rows=1 width=813) (actual time=4.326..163876.029 rows=50000 loops=1)
        ->  Seq Scan on tbl1 stage  (cost=0.00..2680.96 rows=102 width=759) (actual time=3.060..2588.745 rows=50000 loops=1)
              Filter: (((row_number)::integer >= 1) AND ((row_number)::integer < 50001))
        ->  Index Scan using tbl_idx on target_tbl tgt  (cost=0.56..8.59 rows=1 width=134) (actual time=3.152..3.212 rows=1 loops=50000)
              Index Cond: ((col1 = stage.col1) AND (col2 = stage.col2) AND (COALESCE(col3, 'col3'::text) = COALESCE(stage.col3, 'col3'::text)) AND (COALESCE(col4, 'col4'::text) = COALESCE(stage.col4, 'col4'::text)))
Planning time: 17.700 ms
Execution time: 346157.168 ms

Is there any way to avoid the nested loop join during the execution of the above query?
Or is there a way that can help me to reduce the cost of the the nested loop scan, currently it takes 6-7 minutes to update just 50000 records?

No @a_horse_with_no_name Currently I only have index on target_tbl on col1, col2, COALESCE(col3, 'col3'::text), COALESCE(col4, 'col4'::text) — Somu Sinhhaa
– Somu Sinhhaa, Commented Nov 2, 2020 at 15:35

Laurenz Albe · Accepted Answer · 2020-11-02 15:46:25Z

1

PostgreSQL can choose a different join strategy in that case. The reason why it doesn't is the gross mis-estimate in the sequential scan: 102 instead of 50000.

Fix that problem, and things will get better:

ANALYZE tbl1;

If that is not enough, collect more detailed statistics:

ALTER TABLE tbl1 ALTER row_number SET STATISTICS 1000;
ANALYZE tbl1;

All this assumes that row_number is an integer and the type cast is redundant. If you made the mistake to use a different data type, an index is your only hope:

CREATE INDEX ON tbl1 ((row_number::integer));
ANALYZE tbl1;

answered Nov 2, 2020 at 15:46

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Somu Sinhhaa Over a year ago

Thanks a lot @Laurenz Albe. Creating the index on the row_number column has reduced the time to 2 mins

jjanes · Accepted Answer · 2020-11-02 16:52:55Z

1

I understand that the Nested Loop Join can be very costly as the right relation is scanned once for every row found in the left relation

But the "right relation" here is an index scan, not a scan of the full table.

You can get it to stop using the index by changing the leading column of the join condition to something like where tgt.col1+0 = stage.col1 .... Upon doing that, it will probably change to a hash join or a merge join, but you will have to try it and see if it does. Also, the new plan may not actually be faster. (And fixing the estimation problem would be preferable, if that works)

Or is there a way that can help me to reduce the cost of the the nested loop scan, currently it takes 6-7 minutes to update just 50000 records?

Your plan shows that over half the time is spent on the update itself, so probably reducing the cost of just the nested loop scan can have only a muted impact on the overall time. Do you have a lot of indexes on the table? The maintenance of those indexes might be a major bottleneck.

answered Nov 2, 2020 at 16:52

jjanes

44.9k5 gold badges39 silver badges48 bronze badges

1 Comment

Somu Sinhhaa Over a year ago

Thanks a lot for your help @jjanes,you are correct, there's a lot of indexes on target table but unfortunately I don't have permission to drop or rebuild them.

Collectives™ on Stack Overflow

Avoid using Nested Loop Join while using a non Equi join condition

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related