-1

In order to store the country information for a person I did:

    CREATE TABLE test
  (
     id      INT IDENTITY(1, 1),
     name    VARCHAR(100) NOT NULL,
     country VARCHAR(100) NOT NULL,
     PRIMARY KEY(id)
  );

INSERT INTO test
VALUES      ('Amy', 'Mexico'),
            ('Tom', 'US'),
            ('Mark', 'Morocco'),
            ('Izzy', 'Mexico');
-- milions of other rows

A lot of the countries will repeat themselves in the country column.

Another option would be to get the country in it's own table and reference the country_id as a FK in the test table:

CREATE TABLE countries
  (
     id   INT IDENTITY(1, 1),
     name VARCHAR(100) NOT NULL,
     PRIMARY KEY(id)
  );

CREATE TABLE test
  (
     id         INT IDENTITY(1, 1),
     name       VARCHAR(100) NOT NULL,
     country_id INT NOT NULL,
     PRIMARY KEY(id),
     FOREIGN KEY(country_id) REFERENCES countries(id)
  ); 

My question is: is there benefit of doing the second scenario from performance point of view/ indexes point of view or it's just cumbersome to do so? ( I know I am not breaking any normal form with the first scenario)

0

2 Answers 2

0

The second version has an obvious performance benefit, namely that only multiple country IDs need be stored for each person-country relationship. This, in turn, means that your storage requirements for the tables and indices would be reduced.

Because the index of the second version would use the integer country ID rather than a string name, I would expect index performance to improve. Your database doesn't "know" that there are only a fixed number of countries. So, the index for the first version would be a B-tree splitting across text, rather than integers. And the former is more verbose than the latter.

Sign up to request clarification or add additional context in comments.

1 Comment

Depends how wide it is. Country codes have a standard two-letter code, so could be 2 bytes rather than 4. You also save one join.
0

The second one will, in general, have poorer performance as joins are always costly. So any query of test that includes the country would need to join to the country table. The storage difference is not going to affect query performance.

In the real world, a country entity would likely have multiple attributes (iso codes, regions, populations, etc) and therefore would need to be normalised into its own entity. You’d then need to join to it, and have the performance hit of joining, but that is, in general, outweighed by the benefits of normalisation - which is why we use normalisation rather than “one big table”

1 Comment

Doesn't mean you need to use surrogate keys, you could use the 2-letter country code as the PK/FK, saving a join in many queries.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.