postgres - estimate index size for timestamp column

Question

Have a postgres table, ENTRIES, with a 'made_at' column of type timestamp without time zone.

That table has a btree index on both that column and on another column (USER_ID, a foreign key):

btree (user_id, date_trunc('day'::text, made_at))

As you can see, the date is truncated at the 'day'. The total size of the index constructed this way is 130 MB -- there are 4,000,000 rows in the ENTRIES table.

QUESTION: How do I estimate the size of the index if I were to care for time to be up to the second? Basically, truncate timestamp at second rather than day (should be easy to do, I hope).

jrs · Accepted Answer · 2013-08-26 22:05:05Z

6

Interesting question! According to my investigation they will be the same size.

My intuition told me that there should be no difference between the size of your two indices, as timestamp types in PostgreSQL are of fixed size (8 bytes), and I supposed the truncate function simply zeroed out the appropriate number of least significant time bits, but I figured I had better support my guess with some facts.

I spun up a free dev database on heroku PostgreSQL and generated a table with 4M random timestamps, truncated to both day and second values as follows:

test_db=> SELECT * INTO ts_test FROM 
                        (SELECT id, 
                                ts, 
                                date_trunc('day', ts) AS trunc_day, 
                                date_trunc('second', ts) AS trunc_s 
                         FROM (select generate_series(1, 4000000) AS id, 
                               now() - '1 year'::interval * round(random() * 1000) AS ts) AS sub) 
                         AS subq;
SELECT 4000000

test_db=> create index ix_day_trunc on ts_test (id, trunc_day);
CREATE INDEX
test_db=> create index ix_second_trunc on ts_test (id, trunc_s);
CREATE INDEX
test_db=> \d ts_test
           Table "public.ts_test"
  Column   |           Type           | Modifiers 
-----------+--------------------------+-----------
 id        | integer                  | 
 ts        | timestamp with time zone | 
 trunc_day | timestamp with time zone | 
 trunc_s   | timestamp with time zone | 
Indexes:
    "ix_day_trunc" btree (id, trunc_day)
    "ix_second_trunc" btree (id, trunc_s)

test_db=> SELECT pg_size_pretty(pg_relation_size('ix_day_trunc'));
          pg_size_pretty 
          ----------------
          120  MB
          (1 row)

test_db=> SELECT pg_size_pretty(pg_relation_size('ix_second_trunc'));
          pg_size_pretty 
          ----------------
          120 MB
          (1 row)

answered Aug 26, 2013 at 22:05

jrs

6163 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

alexakarpov Over a year ago

thanks, appreciate your answer and examples. This is interesting - apparently, I know too little about how database indices are built; I've assumed that since there will be more, err, 'buckets', or 'nodes', for the leaf nodes in the tree, the total size of the tree would also be much bigger. Can you, perhaps, point out what's wrong with my thinking? Thanks!

jrs Over a year ago

It's hard to figure out what you're thinking :). Why are you assuming there will be more leaf nodes in the tree? There are an identical number of rows to index, regardless of the content of the column.

alexakarpov Over a year ago

fair enough =) I'll try to explain what I mean. My intuition is like this - if there are 1000 messages, and all on the same day, then the index would be useless - because, obviously, all the records have the same timestamp up to the date - so index can't help us to narrow down on the individual record. They're all in the same 'bucket'; they're all leaves on the same tree node, no? If we rounded down on the hour, for example, than we would have 24 nodes (assuming a reasonably normal distribution), and actual rows dangle from those in smaller bunches =)

jrs Over a year ago

Alex- You raise a very good point. I'm afraid I cant answer authoritatively. The correct answer probably depends on the particular btree implementation details.

deFreitas Over a year ago

You should worry if your column is varchar for example, in that case the index size depends on column size

|

Collectives™ on Stack Overflow

postgres - estimate index size for timestamp column

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related