1

So User has many :orders, which works like you expect. I also have a valid scope on order that should filter by ensuring the orders are in a set of whitelisted states (not canceled orders, for instance)

I've declared some indices on the orders table, and my schema.rb looks like:

add_index "orders", ["state"], :name => "index_orders_on_state"
add_index "orders", ["user_id", "state"], :name => "index_orders_on_user_id_and_state"
add_index "orders", ["user_id"], :name => "index_orders_on_user_id"

When I run puts user.orders.valid.explain I get this:

EXPLAIN for: SELECT "orders".* FROM "orders"
             WHERE "orders"."user_id" = 1 AND 
                   "orders"."state" IN ('pending', 'packed', 'shipped', 'in_transit', 'delivered', 'return_pending', 'returned')
      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on orders  (cost=4.60..154.88 rows=40 width=3323)
   Recheck Cond: (user_id = 1)
   Filter: ((state)::text = ANY ('{pending,packed,shipped,in_transit,delivered,return_pending,returned}'::text[]))
   ->  Bitmap Index Scan on index_orders_on_user_id  (cost=0.00..4.59 rows=44 width=0)
         Index Cond: (user_id = 1)

So given that I am searching on user_id and state, and a have a compound index for both those fields, why is it not using the index_orders_on_user_id_and_state index? Or am I just reading this explain output wrong?

Is it doing two passes? One to find orders by user_id, and then another pass to check for state?

I need to run queries like this a lot, on a lot of records at once. So any way to keep it speedy is a very good thing.

3
  • About what portion of your rows have those state values? Sometimes the table stats suggest that a scan will be cheaper than using an index. Indexes give the query optimizer extra options, they don't force the optimizer to make any particular choices. Commented Jan 10, 2014 at 3:34
  • state can be one of 8 or 9 values, so perhaps it's just deciding that it doesn't need the more specific index. I guess the choice to use an index or not is more nuanced than I thought it to be. Commented Jan 10, 2014 at 17:42
  • 1
    I mean if, say, 90% of your rows have a state of 'pending', ... 'returned' (i.e. the ones you're looking for) then consulting the index might be pointless. Query optimization is a bit of a black art, it depends on the queries, the indexes, and the contents of the table. Also, indexing low cardinality columns generally isn't that useful and you state might fall in the low cardinality category. Commented Jan 10, 2014 at 18:13

2 Answers 2

1

The database system may decide not to use indexes. For example with Mysql, if the table data is small, it may decide to do a full table scan. You can try putting several million of records and execute the query again to see how the plan change.

Sign up to request clarification or add additional context in comments.

2 Comments

So in short, the answer is "because Postgres doesn't want to" in this case? But the indexes do look to be setup to use the more specific index if necessary right?
I don't know very well about how postgres operates. But it is very likely that Postgres will use the more specific index if necessary.
0

A pretty good explanation of the internal usage of postgres indexes is here:

https://devcenter.heroku.com/articles/postgresql-indexes

the relevant part is

There are many reasons why the Postgres planner may choose to not use an index. Most of the time, the planner chooses correctly, even if it isn’t obvious why. It’s okay if the same query uses an index scan on some occasions but not others. The number of rows retrieved from the table may vary based on the particular constant values the query retrieves. So, for example, it might be correct for the query planner to use an index for the query select * from foo where bar = 1, and yet not use one for the query select * from foo where bar = 2, if there happened to be far more rows with “bar” values of 2. When this happens, a sequential scan is actually most likely much faster than an index scan, so the query planner has in fact correctly judged that the cost of performing the query that way is lower.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.