Previously, I used this query, which was fast:
cb=# explain analyze SELECT "web_route"."id", "web_crag"."id" FROM "web_route"
INNER JOIN "web_crag" ON ( "web_route"."crag_id" = "web_crag"."id" )
WHERE "web_crag"."type" IN (1, 2)
ORDER BY "web_crag"."name" ASC
LIMIT 20;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.16 rows=20 width=18) (actual time=0.027..0.105 rows=20 loops=1)
-> Nested Loop (cost=0.00..47088.94 rows=436055 width=18) (actual time=0.026..0.100 rows=20 loops=1)
-> Index Scan using web_crag_name on web_crag (cost=0.00..503.16 rows=1776 width=14) (actual time=0.011..0.020 rows=14 loops=1)
Filter: (type = ANY ('{1,2}'::integer[]))
-> Index Scan using web_route_crag_id on web_route (cost=0.00..23.27 rows=296 width=8) (actual time=0.004..0.005 rows=1 loops=14)
Index Cond: (crag_id = web_crag.id)
Total runtime: 0.154 ms
(7 rows)
The problem with the query is that the order in which the rows are returned is not deterministic, which caused repeating rows across subsequent pages produced OFFSETing (i.e. pagination did not work properly in my web app). I decided to make the ordering strict by additionally sorting by "web_route".id".
cb=# explain analyze SELECT "web_route"."id", "web_crag"."id" FROM "web_route"
INNER JOIN "web_crag" ON ( "web_route"."crag_id" = "web_crag"."id" )
WHERE "web_crag"."type" IN (1, 2)
ORDER BY "web_crag"."name", "web_route"."id" ASC
LIMIT 20;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=29189.04..29189.09 rows=20 width=18) (actual time=324.065..324.068 rows=20 loops=1)
-> Sort (cost=29189.04..30279.18 rows=436055 width=18) (actual time=324.063..324.064 rows=20 loops=1)
Sort Key: web_crag.name, web_route.id
Sort Method: top-N heapsort Memory: 26kB
-> Hash Join (cost=135.40..17585.78 rows=436055 width=18) (actual time=0.882..195.941 rows=435952 loops=1)
Hash Cond: (web_route.crag_id = web_crag.id)
-> Seq Scan on web_route (cost=0.00..10909.55 rows=436055 width=8) (actual time=0.026..55.916 rows=435952 loops=1)
-> Hash (cost=113.20..113.20 rows=1776 width=14) (actual time=0.848..0.848 rows=1775 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 82kB
-> Seq Scan on web_crag (cost=0.00..113.20 rows=1776 width=14) (actual time=0.004..0.510 rows=1775 loops=1)
Filter: (type = ANY ('{1,2}'::integer[]))
Total runtime: 324.101 ms
(12 rows)
However, as you can see, the query got more than 2000x slower, which is quite a lot :). I wonder what can be done about that if anything. I plan to do really not a nice hack and duplicate "web_crag"."name" into "web_route" so that I can put an index on the two columns (crag_name, id) but if there is a better way I would be glad.
Here are schemes of "web_route" and "web_crag" in case it matters.
cb=# \d web_crag;
Table "public.web_crag"
Column | Type | Modifiers
-----------------+--------------------------+-------------------------------------------------------
id | integer | not null default nextval('web_crag_id_seq'::regclass)
name | character varying(64) | not null
latitude | double precision |
longitude | double precision |
type | integer |
description | text | not null
normalized_name | character varying(64) | not null
country_id | integer |
location_index | character(24) | not null
added_by_id | integer |
date_created | timestamp with time zone |
last_modified | timestamp with time zone |
Indexes:
"web_crag_pkey" PRIMARY KEY, btree (id)
"web_crag_added_by_id" btree (added_by_id)
"web_crag_country_id" btree (country_id)
"web_crag_location_index" btree (location_index)
"web_crag_name" btree (name)
Foreign-key constraints:
"added_by_id_refs_id_1745ebe43b31bec6" FOREIGN KEY (added_by_id) REFERENCES web_member(id) DEFERRABLE INITIALLY DEFERRED
"country_id_refs_id_1384050a9bd763af" FOREIGN KEY (country_id) REFERENCES web_country(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "web_route" CONSTRAINT "crag_id_refs_id_3ce1145606d12740" FOREIGN KEY (crag_id) REFERENCES web_crag(id) DEFERRABLE INITIALLY DEFERRED
TABLE "web_video" CONSTRAINT "crag_id_refs_id_4fc9cbf2832725ca" FOREIGN KEY (crag_id) REFERENCES web_crag(id) DEFERRABLE INITIALLY DEFERRED
TABLE "web_image" CONSTRAINT "crag_id_refs_id_58210dd331468848" FOREIGN KEY (crag_id) REFERENCES web_crag(id) DEFERRABLE INITIALLY DEFERRED
TABLE "web_eventdestination" CONSTRAINT "crag_id_refs_id_612ad57c4d76c32c" FOREIGN KEY (crag_id) REFERENCES web_crag(id) DEFERRABLE INITIALLY DEFERRED
Triggers:
set_crag_location_index BEFORE INSERT OR UPDATE ON web_crag FOR EACH ROW EXECUTE PROCEDURE set_crag_location_index()
cb=# \d web_route
Table "public.web_route"
Column | Type | Modifiers
--------------------+--------------------------+--------------------------------------------------------
id | integer | not null default nextval('web_route_id_seq'::regclass)
name | character varying(64) | not null
crag_id | integer | not null
sector | character varying(64) | not null
difficulty | character varying(16) | not null
author | character varying(64) | not null
build_date | character varying(32) | not null
description | text | not null
difficulty_numeric | integer |
length_meters | double precision |
added_by_id | integer |
date_created | timestamp with time zone |
last_modified | timestamp with time zone |
normalized_name | character varying(64) | not null
rating_votes | integer | not null
rating_score | integer | not null
Indexes:
"web_route_pkey" PRIMARY KEY, btree (id)
"web_route_added_by_id" btree (added_by_id)
"web_route_crag_id" btree (crag_id)
Check constraints:
"ck_rating_votes_pstv_c39bae29f3b2012" CHECK (rating_votes >= 0)
"web_route_rating_votes_check" CHECK (rating_votes >= 0)
Foreign-key constraints:
"added_by_id_refs_id_157791930f5e12d5" FOREIGN KEY (added_by_id) REFERENCES web_member(id) DEFERRABLE INITIALLY DEFERRED
"crag_id_refs_id_3ce1145606d12740" FOREIGN KEY (crag_id) REFERENCES web_crag(id) DEFERRABLE INITIALLY DEFERRED
web_route.idmight require to sort many web_routes (or at least the planner prepares for this). I would try tocreate index on web_route (crag_id asc, id asc)to gather information about the distribution across columns, then useorder by web_crag.name, web_route.crag_id, web_route.idand hope it suffices.