1

I'm designing a postgres database in which there is a table that has a jsonb type column. I would like this column to be unique. There is no need to have two objects with the exact same json configuration in the table. Down the line it would save me about 5 minutes of computation time per duplicate not saved in the db. I'm aware of the risk of json uniqueness when it comes to dicts (order of keys is not guaranteed), but I think a good json encoder can mitigate this.

My worry is about db performance. I want to make sure we're doing anything possible to make sure inserts will not be slowed down horribly with this uniqueness constraint on jsonb. How bad would a uniqueness constraint on jsonb be compared to a uniqueness on varchar or int? Are we talking milliseconds, seconds or minutes?

I've looked into the Hash Index which does sound like all I would ever need to go for optimal performance. But. Only B-tree type of Index can be unique, which is weird. Why?

1 Answer 1

1

risk of json uniqueness when it comes to dicts (order of keys is not guaranteed)

JSONB does guarantee the order of keys. It also ignores insignificant whitespace and deduplicates the keys it's sorting internally. What you plan to do will work just fine.

I'd be worried if you wanted to use plain json which works pretty much like pre-validated text and does hold on to everything you save into it, making {"a":1} and { "a" : 1 } unequal. Also, for that reason, there's no built-in json=json operator, so you wouldn't be able to define a unique column of type json.

Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.

("does not preserve the order" refers to the input order - it'll reorder that in a stable, deterministic manner, to guarantee the uniform order of keys)

In terms of performance, you have to test it yourself and see if you find it acceptable. Enforcing the constraint will always incur some costs, it's not free by any means - unique is backed by an index, so all operations on the table require maintaining both the table as well as the index.

demo at db<>fiddle

select '{"a":1}'::jsonb = E' {\n\t "a" : 1 \n} '::jsonb
?column?
t
select '{"a":1}'::json = E' {\n\t "a" : 1 \n} '::json;
ERROR:  operator does not exist: json = json
LINE 1: select '{"a":1}'::json = E' {\n\t "a" : 1 \n} '::json;
                               ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
create table test_no_unique(a jsonb);
select setseed(.42);

explain analyze verbose
insert into test_no_unique 
select jsonb_build_object(n::text,n)
from generate_Series(1,1e5)n;
QUERY PLAN
Insert on public.test_no_unique (cost=0.00..17.50 rows=0 width=0) (actual time=724.632..724.633 rows=0 loops=1)
-> Function Scan on pg_catalog.generate_series n (cost=0.00..17.50 rows=1000 width=32) (actual time=33.737..354.927 rows=100000 loops=1)
Output: jsonb_build_object((n.n)::text, n.n)
Function Call: generate_series('1'::numeric, '100000'::numeric)
Planning Time: 0.068 ms
Execution Time: 725.363 ms
create table test_unique_no_onconflict(a jsonb unique);
select setseed(.42);

explain analyze verbose
insert into test_unique_no_onconflict 
select jsonb_build_object(n::text,n)
from generate_Series(1,1e5)n;
QUERY PLAN
Insert on public.test_unique_no_onconflict (cost=0.00..17.50 rows=0 width=0) (actual time=1953.638..1953.639 rows=0 loops=1)
-> Function Scan on pg_catalog.generate_series n (cost=0.00..17.50 rows=1000 width=32) (actual time=64.094..392.841 rows=100000 loops=1)
Output: jsonb_build_object((n.n)::text, n.n)
Function Call: generate_series('1'::numeric, '100000'::numeric)
Planning Time: 0.047 ms
Execution Time: 1955.242 ms
create table test_unique_on_conflict_do_nothing_without_dupes(a jsonb unique);
select setseed(.42);

explain analyze verbose
insert into test_unique_on_conflict_do_nothing_without_dupes 
select jsonb_build_object(n::text,n)
from generate_Series(1,1e5)n
on conflict do nothing;
QUERY PLAN
Insert on public.test_unique_on_conflict_do_nothing_without_dupes (cost=0.00..17.50 rows=0 width=0) (actual time=5938.022..5938.023 rows=0 loops=1)
Conflict Resolution: NOTHING
Tuples Inserted: 100000
Conflicting Tuples: 0
-> Function Scan on pg_catalog.generate_series n (cost=0.00..17.50 rows=1000 width=32) (actual time=33.093..527.146 rows=100000 loops=1)
Output: jsonb_build_object((n.n)::text, n.n)
Function Call: generate_series('1'::numeric, '100000'::numeric)
Planning Time: 0.043 ms
Execution Time: 5938.573 ms

Here, note that it cost a bit to call random() and do the casts compared to previous examples, it cost some to handle the conflicts, but it also saved some simply by effectively writing 36% less rows into the table.

create table test_unique_on_conflict_do_nothing_with_dupes(a jsonb unique);
select setseed(.42);

explain analyze verbose
insert into test_unique_on_conflict_do_nothing_with_dupes 
select jsonb_build_object((random()*1e3)::int::text,(random()*1e2)::int::text)
from generate_Series(1,1e5)n
on conflict do nothing;
QUERY PLAN
Insert on public.test_unique_on_conflict_do_nothing_with_dupes (cost=0.00..47.50 rows=0 width=0) (actual time=5067.804..5067.805 rows=0 loops=1)
Conflict Resolution: NOTHING
Tuples Inserted: 63241
Conflicting Tuples: 36759
-> Function Scan on pg_catalog.generate_series n (cost=0.00..37.50 rows=1000 width=32) (actual time=66.536..491.087 rows=100000 loops=1)
Output: jsonb_build_object((((random() * '1000'::double precision))::integer)::text, (((random() * '100'::double precision))::integer)::text)
Function Call: generate_series('1'::numeric, '100000'::numeric)
Planning Time: 0.121 ms
Execution Time: 5082.982 ms
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.