1

On a Postgres server A, I am calling a query, that employs foreign table(s) from server B (FROM mav4_gmd_data):

EXPLAIN ANALYZE VERBOSE 
            SELECT 
                d.mgd_mav4_gmd_object_mgo_id, 
                d.mgd_creation_date_iso,  
                d.mgd_data
            FROM mav4_gmd_data AS d
            WHERE 
                d.mgd_creation_date_iso > '2021-08-5 10:00' AND 
                d.mgd_mav4_gmd_object_mgo_id IN (
                    SELECT pg.mgo_id
                    FROM mav4_gmd_object as pg
                    WHERE pg.mgo_class = 'Ibc' 
            )

This query takes some significant time. The query planner shows that the SELECT on the Server B (Foreign Scan on public.mav4_gmd_data) needs 8550ms (Foreign Scan on public.mav4_gmd_data)

QUERY PLAN                                                                                                                                                                                           |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Hash Semi Join  (cost=235.92..266.63 rows=17 width=56) (actual time=8572.409..8572.412 rows=0 loops=1)                                                                                               |
  Output: d.mgd_mav4_gmd_object_mgo_id, d.mgd_creation_date_iso, d.mgd_data                                                                                                                          |
  Hash Cond: (d.mgd_mav4_gmd_object_mgo_id = pg.mgo_id)                                                                                                                                              |
  ->  Foreign Scan on public.mav4_gmd_data d  (cost=100.00..129.62 rows=341 width=56) (actual time=24.787..8550.000 rows=135856 loops=1)                                                             |
        Output: d.mgd_id, d.mgd_creation_date_iso, d.mgd_creation_date_unix, d.mgd_mav4_gmd_system_mgs_id, d.mgd_mav4_gmd_object_mgo_id, d.mgd_data                                                  |
        Remote SQL: SELECT mgd_creation_date_iso, mgd_mav4_gmd_object_mgo_id, mgd_data FROM public.mav4_gmd_data WHERE ((mgd_creation_date_iso > '2021-08-05 10:00:00+02'::timestamp with time zone))|
  ->  Hash  (cost=135.80..135.80 rows=10 width=16) (actual time=0.761..0.762 rows=51 loops=1)                                                                                                        |
        Output: pg.mgo_id                                                                                                                                                                            |
        Buckets: 1024  Batches: 1  Memory Usage: 11kB                                                                                                                                                |
        ->  Foreign Scan on public.mav4_gmd_object pg  (cost=100.00..135.80 rows=10 width=16) (actual time=0.744..0.751 rows=51 loops=1)                                                             |
              Output: pg.mgo_id                                                                                                                                                                      |
              Remote SQL: SELECT mgo_id FROM public.mav4_gmd_object WHERE ((mgo_class = 'Ibc'::text))                                                                                                |
Planning Time: 0.164 ms                                                                                                                                                                              |
Execution Time: 8573.195 ms                                                                                                                                                                          |

However, if I run the same subquery directly on the Server B,

EXPLAIN ANALYZE VERBOSE
SELECT mgd_creation_date_iso, mgd_mav4_gmd_object_mgo_id, mgd_data FROM public.mav4_gmd_data WHERE ((mgd_creation_date_iso > '2021-08-05 10:00:00+02'::timestamp with time zone))

it runs significantly faster (100ms):

QUERY PLAN                                                                                                                                                        |
------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Index Scan using idx_mgd_mgd_creation_date_iso on public.mav4_gmd_data  (cost=0.43..16638.90 rows=42119 width=695) (actual time=0.021..96.663 rows=136032 loops=1)|
  Output: mgd_creation_date_iso, mgd_mav4_gmd_object_mgo_id, mgd_data                                                                                             |
  Index Cond: (mav4_gmd_data.mgd_creation_date_iso > '2021-08-05 10:00:00+02'::timestamp with time zone)                                                          |
Planning Time: 0.147 ms                                                                                                                                           |
Execution Time: 103.860 ms                                                                                                                                        |

For larger data sets the difference in the total time is even more significant. I tried also to modify the fetch_size and use_remote_estimate parameters, but without any success. Could it be, that the foreign wrapper is not using the index on the Server B? What else could cause this problem? Or is it a limitation of Postgres?

(PostgreSQL 13.3)

1
  • Did you found the reason? I have an exactly same problem now Commented Sep 22, 2023 at 17:30

1 Answer 1

0

With EXPLAIN ANALYZE, it does need to execute the query, but all it needs to do with the results is count how many rows there are. But with a fdw, it has to execute the query on the foreign side, format the data for transit, actually push it over the network (or at least over IPC) and then parse it out (at least enough to identify the row boundaries) and then count the rows.

You can expect fdw to be slower than doing it directly, but your test is not necessarily realistic for how much slower it will be. Presumably you wouldn't run the query if you didn't want to do something with the result, and doing something non-trivial with the result is going to add more time proportionally to the faster query than the slower one.

For a more realistic test, you could do something like:

COPY (<query>) to '/dev/null';

and time that.

Better yet, actually do with the result whatever it is that you want to do with the result which motivated you to write the query in the first place.

Could it be, that the foreign wrapper is not using the index on the Server B?

I don't see any reason to think that would be the case (After all, the "Remote SQL" line does show the indexable condition getting passed down). But there is no point in speculating when you can actually see. Unfortunately, EXPLAIN ANALYZE's output is not recursive down to the foreign side, but fortunately if you control the foreign server you can set up auto_explain on the foreign server side to capture the plans and then get direct evidence out of the log file about what it was doing.

Sign up to request clarification or add additional context in comments.

1 Comment

Indeed, interesting. If I changed the query on the source database, it takes indeed more time . Still, on the destination database, the query takes almost the double time. Both DBs run on the same machine, but right, at least the IPC must happen, plus the further overhead. The auto_explain did not catch the remote queries, despite I enabled that via shared_preload_libraries globally.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.