0

Consider a table defined as follows:

CREATE TABLE test (
    id int4 NOT NULL,
    tag_counts _jsonb NOT NULL DEFAULT ARRAY[]::jsonb[]
);

INSERT INTO test(id, tag_counts) values(1,array['{"type":1, "count":4}','{"type":2, "count":10}' ]::jsonb[])

How can I create an index on json key type and how can I query on it?

Edit: Previously, there were no indexes on json keys and select queries used an unnest operation as shown below:

select * from (SELECT unnest(tag_counts) as tc
FROM public.test) as t
where tc->'type' = '2';

The problem is, if the table has a large number of rows, the above query will not only include a full table scan, but also filtering through each jsonb array.

4
  • 2
    jsonb[] almost never makes sense. Why not store a proper JSON array in a jsonb column? e.g. '[{"type":1, "count":4}','{"type":2, "count":10}]' Commented Feb 10, 2021 at 17:17
  • Legacy issues. Can't really change it without major refactoring. Commented Feb 10, 2021 at 17:50
  • So how was it indexed in the legacy system? And what query are you trying to speed up? Commented Feb 10, 2021 at 18:22
  • Updated the the question to include how its done currently Commented Feb 10, 2021 at 20:35

1 Answer 1

1

There is a way to index this, not sure how fast it will be.

If that was a "regular" jsonb column, you could use a condition like where tag_counts @> '[{"type": 2}]' which can use a GIN index on the column.

You can use that operator if you convert the array to "plain" json value:

select *
from test
where to_jsonb(tag_counts) @> '[{"type": 2}]'

Unfortunately, to_jsonb() is not marked as immutable (I guess because of potential timestamp conversion in there) which is a requirement if you want to use an expression in an index.

But for your data, this is indeed immutable, so we can create a little wrapper function:

create function as_jsonb(p_input jsonb[])
returns  jsonb
as
$$
  select to_jsonb(p_input);
$$
language sql
immutable;

And with that function we can create an index:

create index on test using gin ( as_jsonb(tag_counts) jsonb_path_ops);

You will need to use that function in your query:

select *
from test
where as_jsonb(tag_counts) @> '[{"type": 2}]'

On a table with a million rows, I get the following execution plan:

Bitmap Heap Scan on stuff.test  (cost=1102.62..67028.01 rows=118531 width=252) (actual time=15.145..684.062 rows=147293 loops=1)
  Output: id, tag_counts
  Recheck Cond: (as_jsonb(test.tag_counts) @> '[{"type": 2}]'::jsonb)
  Heap Blocks: exact=25455
  Buffers: shared hit=25486
  ->  Bitmap Index Scan on ix_test  (cost=0.00..1072.99 rows=118531 width=0) (actual time=12.347..12.356 rows=147293 loops=1)
        Index Cond: (as_jsonb(test.tag_counts) @> '[{"type": 2}]'::jsonb)
        Buffers: shared hit=31
Planning:
  Buffers: shared hit=23
Planning Time: 0.444 ms
Execution Time: 690.160 ms
Sign up to request clarification or add additional context in comments.

1 Comment

Sheer genius. It improved the query time for me significantly. Thanks for the help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.