2

I want to have a case-insensitive unique key on a large table.

I'm looking for the fastest approach.

I found two options: my_table_a with a generated column and my_table_b with a unique index

CREATE TABLE my_table_a
(
  my_table_id_a int GENERATED ALWAYS AS IDENTITY
    CONSTRAINT pk_my_table_a PRIMARY KEY,
  name text NOT NULL,
  key text GENERATED ALWAYS AS (LOWER(name)::text) STORED NOT NULL
);

CREATE TABLE my_table_b
(
  my_table_id_b int GENERATED ALWAYS AS IDENTITY
    CONSTRAINT pk_my_table_b PRIMARY KEY,
  name text NOT NULL
);

CREATE UNIQUE INDEX unique_name ON my_table_b (LOWER(name));

But I'm not sure what is happening on an insert. I fear that for a single insert, it is required to calculate for all rows the unique key.

What do you think is the fastest solution?

2
  • 1
    What do your test results results show? Commented Sep 13, 2024 at 17:04
  • 1
    "Fastest" is not well defined. If it's just about write performance, the expression index wins. But you'll want to do something with your data? Depending on complete usage patterns and associated priorities, the answer may change. Commented Sep 14, 2024 at 0:56

2 Answers 2

1

As demonstrated in a similar thread here, it'll be even faster if you use the lower() expression-based index, but also make it use C collation:

create unique index on my_table_b(lower(name) collate "C");

Here you can find a few benchmarks comparing citext, lower() expression based index with default collation, a plain index using a custom case-insensitive collation as well as citext with a C collation.
You can tweak it or run your own tests but in that one, the example above won.

If you wanted to speed anything up but generating the column, that's not necessary - it will only duplicate the work and take more space.

You can set up unique as a constraint or as an index, but unique constraints are technically always enforced by an implicitly created index. When you add new values, Postgres tries to save them to the table as well as ingest them into the index by searching for a spot in it - if it finds that its occupied, depending on currently configured deferrability it'll raise an exception. You don't have to fear it'll ever try to seq scan the table to check that, all of this logic takes place in the index.

Indexes, indexes later added to the table as constraints and constraints specified on the table directly have slightly different syntax, behaviour and limitations, e.g.: the constraint doesn't let you pick the collation. Hence, in this case, your best bet is the expression index with a collation, directly on the name column.

Sign up to request clarification or add additional context in comments.

Comments

0

In the second case, you have an index on the lowercased name. This means that the lowercase names are already calculated and stored in a data structure that can be searched quickly (the index). When you insert a new name, its lowercase is computed, and then searched in this index against the recalculated lowercase names of the preexisting records.

In the first case, you only have a generated column with no promise of uniqueness. You could add a unique constraint to it, which would make it pretty similar to the second solution.

2 Comments

Yes, the containt is missing CREATE UNIQUE INDEX unique_name_a ON my_table_a (key); Do you think both have a very similar performance?
You'll have to benchmark it to get a definitive answer, but both approaches should have similar performances once you add that constraint.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.