9

I have many read-only tables in a Postgres database. All of these tables can be queried using any combination of columns.

What can I do to optimize queries? Is it a good idea to add indexes to all columns to all tables?

2
  • 2
    I'd try to establish where (on which queries) most time is spent, and in the first instance optimize for those. Commented Sep 15, 2011 at 16:15
  • Queries are dynamically generated and the only thing I know is that they are selects with filtering by one or more columns with an orderby clause (and limit/offset). There are no joins. Commented Sep 16, 2011 at 19:09

2 Answers 2

17

Columns that are used for filtering or joining (or, to a lesser degree, sorting) are of interest for indexing. Columns that are just selected are barely relevant! For the following query only indexes on a and e may be useful:

SELECT a,b,c,d
FROM   tbl_a
WHERE  a = $some_value
AND    e < $other_value;

Here, f and possibly c are candidates, too:

SELECT a,b,c,d
FROM   tbl_a
JOIN   tbl_b USING (f)
WHERE  a = $some_value
AND    e < $other_value
ORDER  BY c;

After creating indexes (and possibly running ANALYZE or even VACUUM ANALYZE on the table), test to see if they are actually useful with EXPLAIN or EXPLAIN ANALYZE. Also compare execution times with and without the indexes. Deleting and recreating indexes is fast and easy. There are also parameters to experiment with EXPLAIN ANALYZE. The difference may be staggering or nonexistent.
As your tables are read-only, index maintenance is cheap after initial creation. It's merely a question of disk space.

If you really want to know what you are doing, start by reading the docs.

If you don't know what queries to expect ...

  1. Try logging enough queries to find typical use cases. Log queries with the parameter log_statement = all for that. Or just log slow queries using log_min_duration_statement.

  2. Create indexes that might be useful and check the statistics after some time to see what actually gets used. PostgreSQL has a whole infrastructure in place for monitoring statistics. One convenient way to study statistics (and many other tasks) is pgAdmin where you can chose your table / function / index and get all the data on the "statistics" tab in the object browser (main window).

  3. Proceed as described above to see if indexes in use actually speed things up.

  4. If the query planner should chose to use one or more of your indexes but to no or adverse effect then something is probably wrong with your setup and you need to study the basics of performance optimization: vacuum, analyze, cost parameters, memory usage, ...

Sign up to request clarification or add additional context in comments.

9 Comments

Queries are dynamically generated and the only thing I know is that they are selects with filtering by one or more columns with an orderby clause (and limit/offset). There are no joins. Do you consider adding indexes on all columns would be good practice in this case?
Just adding indexes on all columns may or may not help. Indexes do have costs, especially for heavily written tables. Less so for read-only tables. I added a paragraph to my answer above.
Unfortunately, I won't be able to find typical use cases since I won't have access to the server once the project is finished. I wanted to do everything I could to optimize it in the development stage. Anyway, a lot of useful tips. Thanks.
Why is it that only indexes on a and e may be useful?
@KyelJmD: That's true for the first example and the explanation is given in bold letters: a and e ar used to filter the result. (WHERE a = $some_value AND e < $other_value.
|
2

If you have filtering by more columns indexes may help but not too much. Also indexes may not help for small tables.

First search for "postgresql tuning" - you will find usefull information.

If database can fit in memory - buy enough RAM. If database can not fit in memory - SSD will help. If this is not enough and database is read only - run 2, 3 or more servers. Or partition database (in the best case to fit in memory of each server).

Even if queries are generated I think they will not be random. Monitor database for slow queries and improve only them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.