1

I am doing a query on my database of a few million items that gets really slow when I add in an order. Here is the code I am calling:

Post.where(source_id: source_ids_array).page(1).per(100).order("position asc, external_created_at desc")

(I am using Kaminari to do pagination)

Which gives me the following sql:

Post Load (36537.8ms)  SELECT  "posts".* FROM "posts"  WHERE "posts"."source_id" IN (17805, 18768, 20717, 17803, 17804, 18329, 20705, 19075, 19110, 19082, 18328)  ORDER BY position asc, external_created_at desc LIMIT 100 OFFSET 0

However, when I modify the query to just be:

Post.where(source_id: source_ids_array).page(1).per(100).order("position asc")

I get the following sql:

Post Load (279.6ms)  SELECT  "posts".* FROM "posts"  WHERE "posts"."source_id" IN (17805, 18768, 20717, 17803, 17804, 18329, 20705, 19075, 19110, 19082, 18328)  ORDER BY position asc LIMIT 100 OFFSET 0

Which is insanely faster.

My indexes in my schema.db look like this:

add_index "posts", ["external_created_at"], name: "index_posts_on_external_created_at", using: :btree
add_index "posts", ["position", "external_created_at"], name: "index_posts_on_position_and_external_created_at", using: :btree
add_index "posts", ["position"], name: "index_posts_on_position", using: :btree

How can I go about speeding up this query?

Edit: here is my EXPLAIN ANALYZE:

Limit  (cost=633132.80..633133.05 rows=100 width=891) (actual time=31927.725..31927.751 rows=100 loops=1)
  ->  Sort  (cost=633132.80..635226.42 rows=837446 width=891) (actual time=31927.720..31927.729 rows=100 loops=1)
        Sort Key: "position", external_created_at
        Sort Method: top-N heapsort  Memory: 78kB
        ->  Bitmap Heap Scan on posts  (cost=19878.94..601126.22 rows=837446 width=891) (actual time=487.399..30855.211 rows=858629 loops=1)
              Recheck Cond: (source_id = ANY ('{17805,18768,20717,17803,17804,18329,20705,19075,19110,19082,18328}'::integer[]))
              Rows Removed by Index Recheck: 1050547
              ->  Bitmap Index Scan on index_posts_on_source_id  (cost=0.00..19669.58 rows=837446 width=0) (actual time=485.025..485.025 rows=927175 loops=1)
                    Index Cond: (source_id = ANY ('{17805,18768,20717,17803,17804,18329,20705,19075,19110,19082,18328}'::integer[]))
Total runtime: 31927.998 ms
4
  • 1
    It would be helpful to see the output of EXPLAIN ANALYZE for each of the queries (particularly the super-slow one). To do that, open up psql or rails db, and run EXPLAIN ANALYZE SELECT "posts".* FROM "posts" WHERE <rest of SQL goes here>. Alternatively, I would suggest trying .order("position asc, external_created_at asc") (with the orderings in the same direction) and see if that produces a speedier result or not (I suspect your compound index is not used due to the mismatched sort directions). Commented Dec 14, 2015 at 5:32
  • 2
    @RobertNubel You can use Post.where(source_id: source_ids_array).page(1).per(100).order("position asc, external_created_at desc").explain in the rails console instead. Much simpler than copypasting into psql Commented Dec 14, 2015 at 8:11
  • @max it seems that calling .explain only results in a EXPLAIN SELECT instead of an EXPLAIN ANALYZE SELECT Commented Dec 14, 2015 at 10:35
  • @goddamnyouryan, your correct there. ActiveRecord seems to have only EXPLAIN since its polyglot. Commented Dec 14, 2015 at 10:44

1 Answer 1

3

Although its not very well documented can specify the sort order when creating an index:

add_index :posts, [:external_created_at, :position], 
    order: { position: :asc, external_created_at: :desc }

If we then run rake db:structure:dump we can see that it creates the following SQL:

CREATE INDEX "index_posts_on_external_created_at_and_position" 
 ON "posts" ("external_created_at" DESC, "position" ASC);

Note that we don't need to specify using: :btree since Postgres defaults to using B-tree or the name:.

Sign up to request clarification or add additional context in comments.

4 Comments

I hope this was helpful. I'm typing on an ancient iMac where inserting a meaningful amount of records would take forever so I was not able to verify that the index is used. I would recommend you mirror the database with pgbackups and test out how to tweak the index and query with .explain.
I tried adding this and it seems to have sped it up a bit, but it's still pretty slow. I edited my question to add the EXPLAIN ANALYZE.
Ok, I'm at a loss on how to further optimise it. You'll have to wait for the Postgres wizards to show up with their pointy hats.
It looks like just reversing the order of your index call: add_index :posts, [:position, :external_created_at], order: { position: :asc, external_created_at: :desc } has about tripled the query speed! So simple! Of course, this isn't 2 orders of magnitude better, but it's a good start!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.