1

I have a Rails app on a Postgres database which relies heavily on queries like this:

SELECT DISTINCT client_id FROM orders WHERE orders.total>100

I need, essentially, the ids of all the clients who have orders which meet a certain condition. I only need the id, so I figured this is way faster than using joins.

Would I benefit from adding an index to the column "total"? I don't mind insert speed, I just need the query to run extremely fast.

5
  • As always: PostgreSQL version? Table definition? Commented Dec 9, 2012 at 19:41
  • Postgres 9.1.3 on Heroku. What do you mean by table definition? Commented Dec 9, 2012 at 19:54
  • That's a table definition: CREATE TABLE foo (id integer ...). I am asking, because data types and primary key and other stuff matter, Commented Dec 9, 2012 at 19:56
  • Oh I see. "orders" is a float. I dont really have the table definition cause the question is a made up example of something similar but much more complicated. I realised that I needed a multicolumn indices more than I thought, because queries have complicated conditions over more fields together, not just "orders". However, I created the indices and got a 10x performance increase! Thank you very much Commented Dec 9, 2012 at 20:05
  • 1
    Remember to weigh the cost of specialized indices against their benefit. You wouldn't create a separate multicolumn index for every rarely used query. PostgreSQL can combine indexes with pretty good performance. Commented Dec 9, 2012 at 20:45

3 Answers 3

4

I would expect the following multicolumn index to be fastest:

CREATE INDEX orders_foo_idx ON orders (total DESC, client_id);

PostgreSQL 9.2 could benefit even more. With it's "index-only tuples" feature, it could serve the query without hitting the table under favorable circumstances: no writes since the last VACUUM.

DESC or ASC hardly matters in this case. A B-tree index can be searched in both directions almost equally efficient.

Sign up to request clarification or add additional context in comments.

Comments

1

Absolutely. With no index on the total column, this query will require a table scan. With an index on the total column, it will require an index seek and key lookup. This will provide your query with huge performance gains as the size of the table grows.

2 Comments

Whether or not the index will be used depends on how many rows qualify for amount > 100. If more that ~20% of the rows are returned by the condition, then the index won't be used.
Thanks, actually VERY few rows meet each condition.
1
>  I only need the id, so I figured this is way faster than using joins.

True, though I'm not sure why you would consider using joins in the first place in this case.

As cmotley said, you're going to require an index on the total column for this query. However, optimal performance is going to depend on exactly which queries you're running. For example, for this query, with this table structure, the fastest you're going to get is to create an index like so:

CREATE INDEX IX_OrderTotals ON orders (total, client_id)

By including the client_id in the index, you create something called a covered index on the client_id column, so the database engine won't have to look up the row behind the scenes in order to fetch your data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.