6

Have a postgres table, ENTRIES, with a 'made_at' column of type timestamp without time zone.

That table has a btree index on both that column and on another column (USER_ID, a foreign key):

btree (user_id, date_trunc('day'::text, made_at))

As you can see, the date is truncated at the 'day'. The total size of the index constructed this way is 130 MB -- there are 4,000,000 rows in the ENTRIES table.

QUESTION: How do I estimate the size of the index if I were to care for time to be up to the second? Basically, truncate timestamp at second rather than day (should be easy to do, I hope).

1 Answer 1

6

Interesting question! According to my investigation they will be the same size.

My intuition told me that there should be no difference between the size of your two indices, as timestamp types in PostgreSQL are of fixed size (8 bytes), and I supposed the truncate function simply zeroed out the appropriate number of least significant time bits, but I figured I had better support my guess with some facts.

I spun up a free dev database on heroku PostgreSQL and generated a table with 4M random timestamps, truncated to both day and second values as follows:

test_db=> SELECT * INTO ts_test FROM 
                        (SELECT id, 
                                ts, 
                                date_trunc('day', ts) AS trunc_day, 
                                date_trunc('second', ts) AS trunc_s 
                         FROM (select generate_series(1, 4000000) AS id, 
                               now() - '1 year'::interval * round(random() * 1000) AS ts) AS sub) 
                         AS subq;
SELECT 4000000

test_db=> create index ix_day_trunc on ts_test (id, trunc_day);
CREATE INDEX
test_db=> create index ix_second_trunc on ts_test (id, trunc_s);
CREATE INDEX
test_db=> \d ts_test
           Table "public.ts_test"
  Column   |           Type           | Modifiers 
-----------+--------------------------+-----------
 id        | integer                  | 
 ts        | timestamp with time zone | 
 trunc_day | timestamp with time zone | 
 trunc_s   | timestamp with time zone | 
Indexes:
    "ix_day_trunc" btree (id, trunc_day)
    "ix_second_trunc" btree (id, trunc_s)

test_db=> SELECT pg_size_pretty(pg_relation_size('ix_day_trunc'));
          pg_size_pretty 
          ----------------
          120  MB
          (1 row)

test_db=> SELECT pg_size_pretty(pg_relation_size('ix_second_trunc'));
          pg_size_pretty 
          ----------------
          120 MB
          (1 row)
Sign up to request clarification or add additional context in comments.

6 Comments

thanks, appreciate your answer and examples. This is interesting - apparently, I know too little about how database indices are built; I've assumed that since there will be more, err, 'buckets', or 'nodes', for the leaf nodes in the tree, the total size of the tree would also be much bigger. Can you, perhaps, point out what's wrong with my thinking? Thanks!
It's hard to figure out what you're thinking :). Why are you assuming there will be more leaf nodes in the tree? There are an identical number of rows to index, regardless of the content of the column.
fair enough =) I'll try to explain what I mean. My intuition is like this - if there are 1000 messages, and all on the same day, then the index would be useless - because, obviously, all the records have the same timestamp up to the date - so index can't help us to narrow down on the individual record. They're all in the same 'bucket'; they're all leaves on the same tree node, no? If we rounded down on the hour, for example, than we would have 24 nodes (assuming a reasonably normal distribution), and actual rows dangle from those in smaller bunches =)
Alex- You raise a very good point. I'm afraid I cant answer authoritatively. The correct answer probably depends on the particular btree implementation details.
You should worry if your column is varchar for example, in that case the index size depends on column size
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.