7

What is the best Postgres datatype to use for a primary key that holds values of fixed size strings?

(for instance - values are exactly 6 chars of the alphabet [0-z,a-z,A-Z]).

Should I use char[6] (is it even appropriate to use as a primary key?) Should I use bigserial and do convertion from number to base62 in the application?

4
  • My first thought is to use a string with a check constraint. Commented Dec 11, 2017 at 13:02
  • I assume text is the "best" for any char, varying or fixed length Commented Dec 11, 2017 at 13:03
  • I am going to put a PK constraint. I'm asking more about performance. if I do check constraint - I do a check every insert which is bad. Commented Dec 11, 2017 at 13:05
  • 1
    You should probably read through postgresql.org/docs/current/static/datatype-character.html For one thing, I'm pretty sure VarChar(6) in Postgres is pretty much the same as text with a check constraint on the length, while Char(6) actually has to do extra work for certain cases to do with space padding. Commented Dec 11, 2017 at 13:09

1 Answer 1

5

You would do this with something like this:

create table t (
    tId char(6) primary key,
    . . .
    constraint chk_t_tId check (tId ~ '^[0-9a-zA-Z]{6}$')
);

There is no problem having the id as a six character string.

Sign up to request clarification or add additional context in comments.

6 Comments

I'd be interested in the advantages of the fixed length; according to the manual, there is none: "While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL". But maybe there's some edge case reason regarding indexes / constraints?
@IMSoP . . . I understand that note. I don't think that Postgres stores the length of a fixed-length character string -- which is the slight efficiency gain I was thinking of.
Then surely the note is wrong, and there is a performance advantage? The FAQ on the Postgres wiki agrees with the manual that it makes no difference; it is one of the four types referred to by the sentence: "The first four types above are "varlena" types (i.e., the field length is explicitly stored on disk, followed by the data)."
And some random searching found a possible reason in this old mailing list thread: when passing values around, you need to know their length to process them safely, and so it makes everything easier to just always attach the length to any string or similar datum, even if it's being used in a context where that could be deduced from the schema or other type information.
Note that Unicode collation can be a real performance killer for string-based index lookups. It might be a good idea to declare the primary key with COLLATE "C".
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.