Postgres caseinsensitive unique key constraint: What is the fast option?

Question

I want to have a case-insensitive unique key on a large table.

I'm looking for the fastest approach.

I found two options: my_table_a with a generated column and my_table_b with a unique index

CREATE TABLE my_table_a
(
  my_table_id_a int GENERATED ALWAYS AS IDENTITY
    CONSTRAINT pk_my_table_a PRIMARY KEY,
  name text NOT NULL,
  key text GENERATED ALWAYS AS (LOWER(name)::text) STORED NOT NULL
);

CREATE TABLE my_table_b
(
  my_table_id_b int GENERATED ALWAYS AS IDENTITY
    CONSTRAINT pk_my_table_b PRIMARY KEY,
  name text NOT NULL
);

CREATE UNIQUE INDEX unique_name ON my_table_b (LOWER(name));

But I'm not sure what is happening on an insert. I fear that for a single insert, it is required to calculate for all rows the unique key.

What do you think is the fastest solution?

"Fastest" is not well defined. If it's just about write performance, the expression index wins. But you'll want to do something with your data? Depending on complete usage patterns and associated priorities, the answer may change. — Erwin Brandstetter
– Erwin Brandstetter, Commented Sep 14, 2024 at 0:56

Zegarek · Accepted Answer · 2024-09-13 12:55:53Z

As demonstrated in a similar thread here, it'll be even faster if you use the lower() expression-based index, but also make it use C collation:

create unique index on my_table_b(lower(name) collate "C");

Here you can find a few benchmarks comparing citext, lower() expression based index with default collation, a plain index using a custom case-insensitive collation as well as citext with a C collation.
You can tweak it or run your own tests but in that one, the example above won.

If you wanted to speed anything up but generating the column, that's not necessary - it will only duplicate the work and take more space.

You can set up unique as a constraint or as an index, but unique constraints are technically always enforced by an implicitly created index. When you add new values, Postgres tries to save them to the table as well as ingest them into the index by searching for a spot in it - if it finds that its occupied, depending on currently configured deferrability it'll raise an exception. You don't have to fear it'll ever try to seq scan the table to check that, all of this logic takes place in the index.

Indexes, indexes later added to the table as constraints and constraints specified on the table directly have slightly different syntax, behaviour and limitations, e.g.: the constraint doesn't let you pick the collation. Hence, in this case, your best bet is the expression index with a collation, directly on the name column.

Mureinik · Accepted Answer · 2024-09-13 11:11:32Z

0

In the second case, you have an index on the lowercased name. This means that the lowercase names are already calculated and stored in a data structure that can be searched quickly (the index). When you insert a new name, its lowercase is computed, and then searched in this index against the recalculated lowercase names of the preexisting records.

In the first case, you only have a generated column with no promise of uniqueness. You could add a unique constraint to it, which would make it pretty similar to the second solution.

answered Sep 13, 2024 at 11:11

Mureinik

316k54 gold badges399 silver badges405 bronze badges

2 Comments

Sigurd Over a year ago

Yes, the containt is missing CREATE UNIQUE INDEX unique_name_a ON my_table_a (key); Do you think both have a very similar performance?

Mureinik Over a year ago

You'll have to benchmark it to get a definitive answer, but both approaches should have similar performances once you add that constraint.

Collectives™ on Stack Overflow

Postgres caseinsensitive unique key constraint: What is the fast option?

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related