1

I'm creating a database to store the events of mobile apps recovered from multiple sources. Problem is that rows from the event table don't have much meaning to the user as it's mostly a succession of integers. Forcing them to make multiple joins or multiple queries.

CREATE TABLE source (
    id              serial PRIMARY KEY,
    value           string NOT NULL
); 

CREATE TABLE application (
    id              serial PRIMARY KEY,
    value           string NOT NULL
);

CREATE TABLE platform (
    id              serial PRIMARY KEY,
    value           string NOT NULL
);

CREATE TABLE country (
    id              serial PRIMARY KEY,
    value           string NOT NULL
);

CREATE TABLE event (
    id              serial PRIMARY KEY,
    source_id       integer REFERENCES source(id),
    application_id  integer REFERENCES application(id),
    platform_id     integer REFERENCES platform(id),
    country_id      integer REFERENCES country(id),

    ...

    updated_at      date NOT NULL,
    value           decimal(100, 2) NOT NULL
);

I thought of directly using the value of the "secondary" tables as a primary key (as it's unique and not null) that I would reference in the event table. It would look like that:

CREATE TABLE source (
    value              string PRIMARY KEY
); 

CREATE TABLE application (
    value              string PRIMARY KEY
);

CREATE TABLE platform (
    value              string PRIMARY KEY
);

CREATE TABLE country (
    value              string PRIMARY KEY
);

CREATE TABLE event (
    id              serial PRIMARY KEY,
    source          string REFERENCES source(value),
    application     string REFERENCES application(value),
    platform        string REFERENCES platform(value),
    country         string REFERENCES country(value),

    ...

    updated_at      date NOT NULL,
    value           decimal(100, 2) NOT NULL
);

I think it might also be good this way as I don't really see an added value at using a surrogate key in this situation. Also prevents me from using views which might have slower performances as it executes a query every time I use the view in a query.

What do you think of this option ?

3
  • 1
    If the value columns in your source, platform, application, and country tables satisfy the criteria for a primary key (not null, unique, and unchanging) then I like the second design better. Commented Mar 29, 2017 at 11:09
  • 3
    The downside of the second approach is that you will duplicate the text values in event table. If the values are long, it will take up much space, which can be avoided if you were using integer keys. Commented Mar 29, 2017 at 11:10
  • If the strings are codes then use them as keys. Simple beats complex, if it is correct. Commented Mar 29, 2017 at 12:10

1 Answer 1

4

"real" systems usually use surrogate keys. There are multiple reasons why:

  • Integers are more efficient for indexes, because they are fixed length.
  • Integers are more efficient for foreign key references, because they are only four bytes (strings are often larger).
  • String values may change and then referring tables need to be updated.
  • Auto generated primary keys contain other information, such as the order of insertion.
  • End users do not directly access tables. If such functionality is needed, then views are fine.

There is nothing per se wrong with using strings. But in practice, they are not used for this purpose.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.