0

I am facing a problem with a specific query on postgressql.

Look the explain:

                          ->  Nested Loop Left Join  (cost=21547.86..87609.16 rows=123 width=69) (actual time=28.997..562.299 rows=32710 loops=1)
                                ->  Hash Join  (cost=21547.30..87210.72 rows=123 width=53) (actual time=28.913..74.682 rows=32710 loops=1)
                                      Hash Cond: (registry.id = profile.registry_id)
                                      ->  Bitmap Heap Scan on registry  (cost=726.99..66218.46 rows=65503 width=53) (actual time=5.123..32.794 rows=66496 loops=1)
                                            Recheck Cond: ((tenant_id = 1009469) AND active AND (excluded_at IS NULL))
                                            Heap Blocks: exact=12563
                                            ->  Bitmap Index Scan on registry_tenant_id_excluded_at  (cost=0.00..710.61 rows=65503 width=0) (actual time=3.589..3.589 rows=66496 loops=1)
                                                  Index Cond: (tenant_id = 1009469)
                                      ->  Hash  (cost=20202.82..20202.82 rows=49399 width=16) (actual time=23.738..23.738 rows=32710 loops=1)
                                            Buckets: 65536  Batches: 1  Memory Usage: 2046kB
                                            ->  Index Only Scan using profile_tenant_id_registry_id on profile  (cost=0.56..20202.82 rows=49399 width=16) (actual time=0.019..19.173 rows=32710 loops=1)
                                                  Index Cond: (tenant_id = 1009469)
                                                  Heap Fetches: 29493

It misestimate the hash join, even if both the scans are accurate. I already tried to boost the statistics on the related columns but it just estimated from 117 to 123, so I guess this is not the issue.

Why it is misestimating so hard? The nested loop takes a lot of work for the database.

1 Answer 1

1

It looks like rows with same tenant_id also mostly have the same value for registry_id/registry.id. But the planner doesn't understand that. It thinks that registry_id=registry.id will be true as often for the actually selected rows as it will be for randomly selected pairs of rows.

I don't think there is anything you can do about this.

Sign up to request clarification or add additional context in comments.

2 Comments

That is sad, is there any way to force a hash join?
You can set enable_nestloop=off. That might make it pick a hash join instead, but there is no guarantee of that. Maybe you can use the extension github.com/ossc-db/pg_hint_plan (I've never used it).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.