PostgreSQL index for comparison of JSONB values

Question

We're experimenting with JSONB on PostgreSQL 12/13 to see whether it's better alternative for customizable extension attributes than a bunch of extension tables (EAV, I guess) and so far I'm impressed by the results, although using GIN indexes is more tricky than it seems at first.

Experimental table is simple enough:

create TABLE jtest (
    id SERIAL PRIMARY KEY,
    text text,
    ext jsonb
);

CREATE INDEX jtest_ext_gin_idx ON jtest USING gin (ext);

I'm inserting some various data with (a bigger version of) this monstrous block (quoted only for db-fiddle):

DO 'BEGIN
    FOR r IN 1..100000 LOOP
            IF r % 10 <= 3 THEN
                -- some entries have no extension
                INSERT INTO jtest (text, ext) VALUES (''json-'' || LPAD(r::text, 10, ''0''), NULL);
            ELSEIF r % 10 = 7 THEN
                -- let''s add some numbers and wannabe "dates"
                INSERT INTO jtest (text, ext)
                VALUES (''json-'' || LPAD(r::text, 10, ''0''), (''{'' ||
                    ''"hired": "'' || current_date - width_bucket(random(), 0, 1, 1000) || ''",'' ||
                    ''"rating": '' || width_bucket(random(), 0, 1, 10) || ''}'')::jsonb);
            ELSE
                INSERT INTO jtest (text, ext)
                VALUES (''json-'' || LPAD(r::text, 10, ''0''), (''{"email": "user'' || r || ''@mycompany.com", "other-key-'' || r || ''": "other-value-'' || r || ''"}'')::jsonb);
            END IF;
        END LOOP;
END';

Various exact value match operations are easy an GIN works very well for these. But we also need < and LIKE, but let's just focus on comparison for now.

The example query is:

select * from jtest
where ext->>'hired' >= '2020-06-01' -- not using function index on its own

But if I add semantically useless AND the index kicks in:

select * from jtest
where ext->>'hired' >= '2020-06-01'
  and ext?'hired';

Here is a fiddle example.

Question #1: I have no problem to implement a query interpreter in our application to make it work, but is it expected behavior? Can't PG figure out that when >= is used the left side is indeed not null?

I also experimented with functional index on (ext->>'hired') - fiddle here:

CREATE INDEX jtest_ext_hired1_idx ON jtest ((ext->>'hired'));
CREATE INDEX jtest_ext_hired2_idx ON jtest ((ext->>'hired')) WHERE ext ? 'hired';

The second index is MUCH smaller than the first and I'm not sure what the first one is good.

Question #2: When I execute the query with ext->>'hired' >= '2020-06-01' it uses the first one in the fiddle - but not in my tests with 15M of rows (only 18k of them returned). So that's the first confusion - my internal tests I don't want to recreate on fiddle (it would execute for far too long) should be more specific - yet use sequential scan for whatever reason. Why does it use sequential scan on much bigger table?

Answer #2: After running ANALYZE it did and it became fast. As this is not the most important question I answer it directly here.

Finally, not a question, with additional AND ext ? 'hired' it uses jtest_ext_hired2_idx index just fine (both in the fiddle and in my much bigger table).

Question #3: Rather generic, is this even the right approach? If I expect using comparison and LIKE operations on values from JSONB, can I just cover it with additional functional indexes? It's still seems more flexible for our case than adding custom columns or joining extension tables, but can't it bite us in the future?

Yeah, I found that one later, but already converted mine and I liked UX of db-fiddle or sqlfiddle more. Also dbfiddle.uk was many times slower to insert all the rows. It would definitely be nice if the other fiddles understood the dollar quoting. :-( — virgo47
– virgo47, Commented Dec 1, 2020 at 8:43
Btw: you don't actually need a DO block to generate test data: db-fiddle.com/f/2mRXT4wGjM2ZSftjgKyZce/14 — user330315
– user330315, Commented Dec 1, 2020 at 8:45
Whow, that's a neat one. I didn't know how to use that r from generate_series, thank you very much. :-) — virgo47
– virgo47, Commented Dec 1, 2020 at 9:16

user330315 · Accepted Answer · 2020-12-01 08:59:16Z

2

As documented in the manual GIN index only supports the operators: ?, ?&, ?|, @>, @?, @@. So by adding the (seemingly useless) ext?'hired' condition you enable the optimizer to use the GIN index (not the functional index).

To index the hire date, I would create a function that extracts the value as a proper date. You can't do that with a cast in the index expression as the cast is not immutable. But as we know that a cast from a yyyy-mm-dd is indeed immutable, there is nothing wrong with creating a function that is marked immutable.

create function hire_date(p_input jsonb)
  returns date
as
$$
  select (p_input ->> 'hired')::date;
$$
language sql
strict
immutable;

Then you can use:

CREATE INDEX jtest_ext_hired1_idx ON jtest ( (hire_date(ext)) );

And that index is used directly when the function is used in the where clause:

select * 
from jtest
where hire_date(ext) >= '2020-06-01';

Of course that will fail if key 'hire_date' doesn't actually contain a proper DATE value (but it will fail during insert already as the index can't be updated).

Indexing LIKE expressions is in general tricky, but if you only have left anchored search strings (like 'foo%') a regular b-tree index can be used:

create index jtest_email on jtest ( (ext ->> 'email') varchar_pattern_ops);

To index a LIKE expression with a right anchored search string ( like '%foo%') you would need a trigram index.

answered Dec 1, 2020 at 8:59

user330315

Sign up to request clarification or add additional context in comments.

2 Comments

virgo47 Over a year ago

I accept the answer, but after thinking about it more, I'm considering to rely on additional ext?'attr' for most queries and adding specialized indexes only for those in need, e.g. attributes used on high % of rows. I assume having tens to hundreds specialized indexes would be detrimental (space, insert/update performance). We may have many different attributes and it's OK perhaps not to optimize for any possible query - that can be created from UI.

user330315 Over a year ago

The GIN index supports more than just the ? operator. But all of them only test for equality. I don't think you can get a "general" index to support range (or LIKE) queries as well. Although the JSON path operator @@ is supported, it doesn't seem to be used for anything else than equality conditions inside the expression

Collectives™ on Stack Overflow

PostgreSQL index for comparison of JSONB values

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related