I have a many to many relationship which looks like this:
class Feed
has_many :cross_listed_papers, through: :cross_lists, source: :paper
end
Frequently (i.e. on page load), I will need to look up, sort and paginate the cross listed papers for a set of feeds over a given time period. However, this can be very slow:
[107] pry(main)> Paper.includes(:cross_lists).where(cross_lists: { feed_id: [311,312,313,314] }).where("pubdate >= ? and pubdate <= ?", Time.now - 300.days, Time.now).count
D, [2013-12-17T21:56:43.640283 #19404] DEBUG -- : (228.9ms) SELECT COUNT(DISTINCT "papers"."id") FROM "papers" LEFT OUTER JOIN "cross_lists" ON "cross_lists"."paper_id" = "papers"."id" WHERE "cross_lists"."feed_id" IN (311, 312, 313, 314) AND (pubdate >= '2013-02-20 10:56:43.404072' and pubdate <= '2013-12-17 10:56:43.404234') => 2811
228.9 ms is not ideal for something that may end up happening every page load, particularly since this inflates rapidly if I try to join more data in (even for less extensive time ranges). Here's the EXPLAIN ANALYZE:
Aggregate (cost=110771.10..110771.11 rows=1 width=4) (actual time=243.826..243.826 rows=1 loops=1)
-> Hash Join (cost=95343.72..110749.09 rows=8807 width=4) (actual time=93.725..242.725 rows=2830 loops=1)
Hash Cond: (cross_lists.paper_id = papers.id)
-> Bitmap Heap Scan on cross_lists (cost=2876.53..15182.11 rows=158372 width=4) (actual time=15.496..90.232 rows=162981 loops=1)
Recheck Cond: (feed_id = ANY ('{311,312,313,314}'::integer[]))
-> Bitmap Index Scan on index_cross_lists_on_feed_id_and_cross_list_date (cost=0.00..2836.94 rows=158372 width=0) (actual time=14.383..14.383 rows=162981 loops=1)
Index Cond: (feed_id = ANY ('{311,312,313,314}'::integer[]))
-> Hash (cost=91670.95..91670.95 rows=48499 width=4) (actual time=76.079..76.079 rows=48853 loops=1)
Buckets: 4096 Batches: 2 Memory Usage: 861kB
-> Bitmap Heap Scan on papers (cost=1033.46..91670.95 rows=48499 width=4) (actual time=6.495..61.230 rows=48853 loops=1)
Recheck Cond: ((pubdate >= '2013-02-20'::date) AND (pubdate <= '2013-12-17'::date))
-> Bitmap Index Scan on index_papers_on_pubdate (cost=0.00..1021.34 rows=48499 width=0) (actual time=5.437..5.437 rows=48855 loops=1)
Index Cond: ((pubdate >= '2013-02-20'::date) AND (pubdate <= '2013-12-17'::date))
Total runtime: 244.295 ms
Are there indexes I can use to speed up this kind of query, or do I need to resort to denormalization?