What is the recommended collation for a postgresql citext column?

Question

I'm using postgres 16, and I have a number of tables where I need to treat their display ID columns as case insensitive, and also handle LIKE queries with wildcards (ex: P%123%). In order to handle these queries efficiently with range scans, I needed to set the collation for those columns to C rather than the default.

With the new requirement for case insensitive searching, I'm considering changing the column's datatype to citext (https://www.postgresql.org/docs/current/citext.html). Will leaving the collation on these columns as C cause issues, since C is a case sensitive collation? What is the recommended collation for a citext column?

Zegarek · Accepted Answer · 2025-09-16 07:14:43Z

That citext doc already tells you it's somewhat superseded by case-insensitive collations:

Consider using nondeterministic collations (see Section 23.2.2.4) instead of this module. They can be used for case-insensitive comparisons, accent-insensitive comparisons, and other combinations, and they handle more Unicode special cases correctly.

You're better off with a regular text type and a custom collation, or "C" with an expression index using lower(). You can find a few benchmarks here:

If you upgrade to version 18 (release candidate 1 is out), you get nondeterministic collation support for LIKE which handles your prefix search.
In PostgreSQL 16, use collate "C" with a text_pattern_ops expression index:
_{demo at db<>fiddle}

create unique index on test_lower_collate_c 
  (lower(a) collate "C" text_pattern_ops);

explain analyse verbose
select count(*) from test_lower_collate_c where lower(a) like 'eb%5%';

QUERY PLAN
Aggregate (cost=146.29..146.30 rows=1 width=8) (actual time=0.283..0.284 rows=1 loops=1)
Output: count(*)
-> Bitmap Heap Scan on public.test_lower_collate_c (cost=4.94..146.28 rows=5 width=0) (actual time=0.072..0.277 rows=26 loops=1)
Filter: (lower(test_lower_collate_c.a) ~~ 'eb%5%'::text)
Rows Removed by Filter: 162
Heap Blocks: exact=133
-> Bitmap Index Scan on test_lower_collate_c_lower_idx (cost=0.00..4.94 rows=65 width=0) (actual time=0.043..0.043 rows=188 loops=1)
Index Cond: ((lower(test_lower_collate_c.a) >= 'eb'::text) AND (lower(test_lower_collate_c.a) < 'ec'::text))
Planning Time: 0.418 ms
Execution Time: 0.359 ms

Will leaving the collation on these columns as C cause issues, since C is a case sensitive collation?

Values get folded to lowercase when ingested into citext type so the case differences are lost - that's not a problem.

It might be a problem if you're dealing with accents and other non-ASCII texts because collate "C" places them in a different range. According to it where a ~>=~ 'ea' and a ~<~ 'eb' won't find values starting with 'eá' because accent variants go somewhere way behind the whole alphabet instead of following their base letter.

Another thing of note is that I don't see the optimiser adding the range scan to pattern-based search on its own. Given a query like this:
_{demo at db<>fiddle}

select from test_lower_collate_c where lower(a) like 'eb%5%';

text_pattern_ops gets you an additional condition to speed up the search based on prefix

Index Cond: ((lower(test_lower_collate_c.a) ~>=~ 'eb'::text) AND (lower(test_lower_collate_c.a) ~<~ 'ec'::text))

Meanwhile, with citext_pattern_ops I needed to add them on my own:

select from test_citext_collate_c where a like 'eb%5%' and a ~>=~ 'eb' and a ~<~ 'ec';;

And the timing was still worse than for the expression-based index.

What is the recommended collation for a citext column?

If your values/patterns are simple ASCII, COLLATE "C" can handle them fast. Otherwise, it just won't work right.

I am on postgres 16, not 18 so I do need a solution that works on 16, and I am doing prefix searches. We can't just upgrade our production DB.

Collectives™ on Stack Overflow

What is the recommended collation for a postgresql citext column?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related