0

New to Postgres, and not very familiar with how RDBMS work in general. I read that, in certain cases, adding an index to a table speeds up query performance for databases. I tried it out with a table and did so (Postgres v11.2):

CREATE TABLE testtable(
    idString text,
    comment text
);


INSERT INTO 
    testtable(idString, comment)
VALUES
    ('1:2', 'some text'),
    ('12:2', 'blah'),
    ('2:2', 'other text'),
    ('1:3', 'blah'),
    ('33:2', 'blah');


CREATE INDEX myindex ON testtable(idString asc);

The guide I was reading said that, without an index, the database usually does a "sequential scan" of all entries until the query is found, but with an index, it does an "index scan". The guide says to see the query plan using "EXPLAIN", so I do:

EXPLAIN SELECT * FROM testtable WHERE myid = '1:3';

The output, however still seems to be a sequential scan:

                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on testtable  (cost=0.00..1.07 rows=1 width=64)
   Filter: (myid = '1:3'::text)
(2 rows)

I've checked using pgAdmin and see that myindex does exist, but I'm unsure why the database isn't using it? Is there something else that I'm missing/haven't done?

2
  • 3
    You don't have enough rows to make the index worthwhile. Commented Mar 22, 2019 at 21:07
  • Right this was just a test table, can't tell if your comment is just a general comment, or that's the reason the index doesn't exist. If the latter, how many rows does a table have to have before the index is utilized? Commented Mar 22, 2019 at 21:09

1 Answer 1

3

Databases take many factors into consideration when deciding to use an index.

Your query is:

SELECT *
FROM testtable
WHERE myid = '1:3';

There are basically two reasonable approaches:

The first is to scan the data, and apply the WHERE clause to each row.

The second is to lookup the value in the index and then fetch the rest of the data.

Which is cheaper? In your case, the first is cheaper. Why? Only one page needs to be moved from tertiary storage into memory. Scanning the page -- after doing all the work of loading it -- is pretty cheap.

Using the index requires loading two pages, one for the index and one for the data.

Although database optimization is complicated, this is a simple example to give you a flavor of the different methods used in optimization and the trade-offs.

Sign up to request clarification or add additional context in comments.

5 Comments

Still a bit confused. You're saying the second is cheaper, as in the database somehow decides this, right? If so, how does it decide that the second is cheaper?
@albert This is a huge topic in its own right, but basically, the database has a cost model associated with different physical operations such as scanning the table or seeking the index. It then considers various physical operations that accomplish the same logical operation and picks one with the lowest cost.
Ah ok I get it now, so the database determines which one to use. Then, as long as I create the index for the table, I can rest safe knowing that the database will use the index when optimal?
@albert In theory yes, and in this simple case almost certainly. In more complex cases, this so called "cost-based query optimization" mechanism may not work ideally, which is why this is a big topic! :)
@BrankoDimitrijevic That makes sense, thanks so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.