I have a table with ~1.4 millions rows. There are about 5 columns with general info on each row and a 6th column with ~1700 JSON key value pairs.
I am building some summaries from a column called ownership by selecting rows where a specific key value exists. The query below runs in 14.5s
SELECT ownership,
SUM (TO_NUMBER(jsonfield->>'firstvalue','9G999g999')) AS total
FROM
mytable
WHERE
jsonfield->>'firstvalue' IS NOT NULL
group by ownership
My queries will be much larger and I know I'll need to make selections on many key values from the jsonfield. For example, if add another key value, the query time increased to 22.9s
SELECT ownership,
SUM (TO_NUMBER(jsonfield->>'firstvalue','9G999g999')) AS total,
SUM (TO_NUMBER(jsonfield->>'secondvalue','9G999g999')) AS totaltwo
FROM
mytable
WHERE
jsonfield->>'firstvalue' IS NOT NULL
OR
jsonfield->>'secondvalue' IS NOT NULL
group by ownership
There may be instances where I'll need to query on several hundred potential values in the jsonfield. Any suggestions on how to optimize my queries which may speed things up?
Great answer below.. As an FYI, I had to convert my json to jsonb like this before I could create the index. I first created a copy of the json column called jsonbsummary that I then converted to jsonb
ALTER TABLE mytable
ALTER COLUMN jsonbsummary
SET DATA TYPE jsonb
USING jsonbsummary::jsonb;
As an additional FYI - Those queries with grouping that originally took 22+ seconds now run in 200ms with the GIN index! See below
SELECT ownership,
SUM (TO_NUMBER(jsonbsummary->>'firstvalue','9G999g999')) AS total,
SUM (TO_NUMBER(jsonbsummary->>'secondvalue','9G999g999')) AS totaltwo
FROM
mytable
WHERE
jsonbsummary ?| array['firstvalue','secondvalue']
group by ownership