I have a DELETE query that I need to run on PostgreSQL 9.0.4. I am finding that it is performant until it hits 524,289 rows in a subselect query.
For instance, at 524,288 there is no materialized view used and the cost looks pretty good:
explain DELETE FROM table1 WHERE pointLevel = 0 AND userID NOT IN
(SELECT userID FROM table2 fetch first 524288 rows only);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete (cost=13549.49..17840.67 rows=21 width=6)
-> Index Scan using jslps_userid_nopt on table1 (cost=13549.49..17840.67 rows=21 width=6)
Filter: ((NOT (hashed SubPlan 1)) AND (pointlevel = 0))
SubPlan 1
-> Limit (cost=0.00..12238.77 rows=524288 width=8)
-> Seq Scan on table2 (cost=0.00..17677.92 rows=757292 width=8)
(6 rows)
However, as soon as I hit 524,289, the materialized view comes into play and the DELETE query becomes much more costly:
explain DELETE FROM table1 WHERE pointLevel = 0 AND userID NOT IN
(SELECT userID FROM table2 fetch first 524289 rows only);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Delete (cost=0.00..386910.33 rows=21 width=6)
-> Index Scan using jslps_userid_nopt on table1 (cost=0.00..386910.33 rows=21 width=6)
Filter: ((pointlevel = 0) AND (NOT (SubPlan 1)))
SubPlan 1
-> Materialize (cost=0.00..16909.24 rows=524289 width=8)
-> Limit (cost=0.00..12238.79 rows=524289 width=8)
-> Seq Scan on table2 (cost=0.00..17677.92 rows=757292 width=8) (7 rows)
I worked around the issue by using a JOIN in the sub-select query instead:
SELECT s.userid
FROM table1 s
LEFT JOIN table2 p ON s.userid=p.userid
WHERE p.userid IS NULL AND s.pointlevel=0
However, I am still interested in understanding why the materialize decreases performance so drastically.