On a Postgres server A, I am calling a query, that employs foreign table(s) from server B (FROM mav4_gmd_data):
EXPLAIN ANALYZE VERBOSE
SELECT
d.mgd_mav4_gmd_object_mgo_id,
d.mgd_creation_date_iso,
d.mgd_data
FROM mav4_gmd_data AS d
WHERE
d.mgd_creation_date_iso > '2021-08-5 10:00' AND
d.mgd_mav4_gmd_object_mgo_id IN (
SELECT pg.mgo_id
FROM mav4_gmd_object as pg
WHERE pg.mgo_class = 'Ibc'
)
This query takes some significant time. The query planner shows that the SELECT on the Server B (Foreign Scan on public.mav4_gmd_data) needs 8550ms (Foreign Scan on public.mav4_gmd_data)
QUERY PLAN |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Hash Semi Join (cost=235.92..266.63 rows=17 width=56) (actual time=8572.409..8572.412 rows=0 loops=1) |
Output: d.mgd_mav4_gmd_object_mgo_id, d.mgd_creation_date_iso, d.mgd_data |
Hash Cond: (d.mgd_mav4_gmd_object_mgo_id = pg.mgo_id) |
-> Foreign Scan on public.mav4_gmd_data d (cost=100.00..129.62 rows=341 width=56) (actual time=24.787..8550.000 rows=135856 loops=1) |
Output: d.mgd_id, d.mgd_creation_date_iso, d.mgd_creation_date_unix, d.mgd_mav4_gmd_system_mgs_id, d.mgd_mav4_gmd_object_mgo_id, d.mgd_data |
Remote SQL: SELECT mgd_creation_date_iso, mgd_mav4_gmd_object_mgo_id, mgd_data FROM public.mav4_gmd_data WHERE ((mgd_creation_date_iso > '2021-08-05 10:00:00+02'::timestamp with time zone))|
-> Hash (cost=135.80..135.80 rows=10 width=16) (actual time=0.761..0.762 rows=51 loops=1) |
Output: pg.mgo_id |
Buckets: 1024 Batches: 1 Memory Usage: 11kB |
-> Foreign Scan on public.mav4_gmd_object pg (cost=100.00..135.80 rows=10 width=16) (actual time=0.744..0.751 rows=51 loops=1) |
Output: pg.mgo_id |
Remote SQL: SELECT mgo_id FROM public.mav4_gmd_object WHERE ((mgo_class = 'Ibc'::text)) |
Planning Time: 0.164 ms |
Execution Time: 8573.195 ms |
However, if I run the same subquery directly on the Server B,
EXPLAIN ANALYZE VERBOSE
SELECT mgd_creation_date_iso, mgd_mav4_gmd_object_mgo_id, mgd_data FROM public.mav4_gmd_data WHERE ((mgd_creation_date_iso > '2021-08-05 10:00:00+02'::timestamp with time zone))
it runs significantly faster (100ms):
QUERY PLAN |
------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Index Scan using idx_mgd_mgd_creation_date_iso on public.mav4_gmd_data (cost=0.43..16638.90 rows=42119 width=695) (actual time=0.021..96.663 rows=136032 loops=1)|
Output: mgd_creation_date_iso, mgd_mav4_gmd_object_mgo_id, mgd_data |
Index Cond: (mav4_gmd_data.mgd_creation_date_iso > '2021-08-05 10:00:00+02'::timestamp with time zone) |
Planning Time: 0.147 ms |
Execution Time: 103.860 ms |
For larger data sets the difference in the total time is even more significant. I tried also to modify the fetch_size and use_remote_estimate parameters, but without any success. Could it be, that the foreign wrapper is not using the index on the Server B? What else could cause this problem? Or is it a limitation of Postgres?
(PostgreSQL 13.3)