Query statistics on postgres

Question

I am facing a problem with a specific query on postgressql.

Look the explain:

                          ->  Nested Loop Left Join  (cost=21547.86..87609.16 rows=123 width=69) (actual time=28.997..562.299 rows=32710 loops=1)
                                ->  Hash Join  (cost=21547.30..87210.72 rows=123 width=53) (actual time=28.913..74.682 rows=32710 loops=1)
                                      Hash Cond: (registry.id = profile.registry_id)
                                      ->  Bitmap Heap Scan on registry  (cost=726.99..66218.46 rows=65503 width=53) (actual time=5.123..32.794 rows=66496 loops=1)
                                            Recheck Cond: ((tenant_id = 1009469) AND active AND (excluded_at IS NULL))
                                            Heap Blocks: exact=12563
                                            ->  Bitmap Index Scan on registry_tenant_id_excluded_at  (cost=0.00..710.61 rows=65503 width=0) (actual time=3.589..3.589 rows=66496 loops=1)
                                                  Index Cond: (tenant_id = 1009469)
                                      ->  Hash  (cost=20202.82..20202.82 rows=49399 width=16) (actual time=23.738..23.738 rows=32710 loops=1)
                                            Buckets: 65536  Batches: 1  Memory Usage: 2046kB
                                            ->  Index Only Scan using profile_tenant_id_registry_id on profile  (cost=0.56..20202.82 rows=49399 width=16) (actual time=0.019..19.173 rows=32710 loops=1)
                                                  Index Cond: (tenant_id = 1009469)
                                                  Heap Fetches: 29493

It misestimate the hash join, even if both the scans are accurate. I already tried to boost the statistics on the related columns but it just estimated from 117 to 123, so I guess this is not the issue.

Why it is misestimating so hard? The nested loop takes a lot of work for the database.

jjanes · Accepted Answer · 2021-01-31 12:55:09Z

1

It looks like rows with same tenant_id also mostly have the same value for registry_id/registry.id. But the planner doesn't understand that. It thinks that registry_id=registry.id will be true as often for the actually selected rows as it will be for randomly selected pairs of rows.

I don't think there is anything you can do about this.

answered Jan 31, 2021 at 12:55

jjanes

44.9k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bruno Manzo Over a year ago

That is sad, is there any way to force a hash join?

jjanes Over a year ago

You can set enable_nestloop=off. That might make it pick a hash join instead, but there is no guarantee of that. Maybe you can use the extension github.com/ossc-db/pg_hint_plan (I've never used it).

Collectives™ on Stack Overflow

Query statistics on postgres

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related