2

Is there any way I can have an INT column that uses 5 bytes? The reason for this is INT (4 bytes) that allocates 4 billion integers isn't quite enough for my application yet BIGINT(8 bytes) is an overkill which would hurt performance because MySQL would have to look through much larger disk spaces for anything, especially when we have billions of rows, the extra storage for the primary keys would be many GBs.

Is there something like a 5 bytes INT column in MySQL? Is there something I can easily custom tweak MySQL to get that?

3
  • Assuming you have an index on a gjven column, the performance difference bewteen INT and BIGINT should not be that different. Is 50% more storage really that big of a problem for you? Commented Jul 28, 2015 at 1:44
  • @Tim, is it so? Doesn't MySQL have to go through larger disk storage for anything? Commented Jul 28, 2015 at 1:54
  • Yes there would be more storage for BIGINT but if you have an index on the column then the lookup time should still be fast. See the answer given by @StevenMoseley Commented Jul 28, 2015 at 2:00

2 Answers 2

4

The simple answer is no - there isn't any 5 byte integer field.

However, when reading deeper into the source of your question, there seems to be some misperception about the overhead associated with a BIGINT.

Yes, you're looking at double the data storage for that column, but every row in MySQL has:

  1. a 13 byte header for your clustered index (PK)
  2. a 6 byte header per indexed record
  3. a 1 byte pointer per non-indexed record

(See here: https://dev.mysql.com/doc/refman/5.1/en/innodb-table-and-index.html#innodb-physical-record)

Thus, you're only really increasing your row size by about 14% by cutting your BIGINT to 5 bytes ((8-5) / (8+13)), assuming it's the only column in your table.

If you use the COMPACT row format and eliminate your PK, you could save 8 bytes per row (reducing your index to 5 bytes).

Your index's performance impact will be negligible with BIGINT vs INT (though eliminating your PK will probably result in some noticeable performance loss, due to loss of clustering).

The storage impact will also be rather negligible - you're looking at ~ 20 GB impact at an ordinality around 8 billion. Modern storage solutions should eat that up.

Sign up to request clarification or add additional context in comments.

4 Comments

Really? "Modern storage solutions should eat that up." Is there any source backing this up?
yeah but SSD is still pretty small and expensive. Our application is on SSD. I mistakenly saw your words as "Modern storage engines" I thought storage engines are smart enough to use wasted storage in a smart way or something lol
The simple Rule of Thumb is: InnoDB disk space is 2-3 times as much as you would expect from counting bytes in the datatypes.
@kavoir.com - You're in luck! petapixel.com/2015/08/15/…
3

You did not say whether it is AUTO_INCREMENT. If not:

mysql> CREATE TABLE DecPk ( d DECIMAL(11,0) NOT NULL PRIMARY KEY ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.14 sec)

would give you up to 99,999,999,999 in 5 bytes.

Alas, an AUTO_INCREMENT cannot be DECIMAL.

The extra 4 bytes in BIGINT (over INT) adds up to approximate N*4MB for a million rows, where N is the number of references to the column. Assuming it is a PRIMARY KEY, here's how to count the references:

  • Data + PRIMARY KEY, together, counts as 1
  • Each secondary key counts as 1 each
  • Other tables storing that id counts as 1

5-byte INT??

  • INT (also INT SIGNED) occupies 4 bytes, with a range of -2 billion to +2 billion.
  • INT UNSIGNED is also 4 bytes, with a range of 0 to +4 billion.
  • BIGINT (signed or unsigned) is 8 bytes.
  • There is no INT between INT and BIGINT.
  • DECIMAL(m,n) takes about m/2 bytes. m is limited to 64 (or maybe 65). Think of it as each 2 digits are stuffed into a byte.
  • DECIMAL(m,0) for m = 10 or 11, will take 5 bytes.
  • DECIMAL(m,2) for m = 7, 8, or 9, will take 5 bytes.
  • Yeah, the size of DECIMAL is complex.

EXPLAIN sometimes shows an INT being used as the key, but says "key_len = 5". This extra 1 is for NULL.

3 Comments

(I added more.)
Me too. Added a couple relevant links to where your information comes from. (Thanks for reminding me DECIMAL exists.)
This should be the accepted answer, rather than the "modern storage solutions should eat that up" dismissal above. Some applications have storage needs that rise faster than "modern solutions" over time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.