How to speed up postgres query on large table

Question

I have a table with ~1.4 millions rows. There are about 5 columns with general info on each row and a 6th column with ~1700 JSON key value pairs.

I am building some summaries from a column called ownership by selecting rows where a specific key value exists. The query below runs in 14.5s

SELECT ownership,
SUM (TO_NUMBER(jsonfield->>'firstvalue','9G999g999')) AS total
FROM
mytable
WHERE
jsonfield->>'firstvalue' IS NOT NULL
group by ownership

My queries will be much larger and I know I'll need to make selections on many key values from the jsonfield. For example, if add another key value, the query time increased to 22.9s

SELECT ownership,
SUM (TO_NUMBER(jsonfield->>'firstvalue','9G999g999')) AS total,
SUM (TO_NUMBER(jsonfield->>'secondvalue','9G999g999')) AS totaltwo
FROM
mytable
WHERE
jsonfield->>'firstvalue' IS NOT NULL
OR
jsonfield->>'secondvalue' IS NOT NULL
group by ownership

There may be instances where I'll need to query on several hundred potential values in the jsonfield. Any suggestions on how to optimize my queries which may speed things up?

Great answer below.. As an FYI, I had to convert my json to jsonb like this before I could create the index. I first created a copy of the json column called jsonbsummary that I then converted to jsonb

ALTER TABLE mytable
  ALTER COLUMN jsonbsummary
  SET DATA TYPE jsonb
  USING jsonbsummary::jsonb;

As an additional FYI - Those queries with grouping that originally took 22+ seconds now run in 200ms with the GIN index! See below

SELECT ownership,
SUM (TO_NUMBER(jsonbsummary->>'firstvalue','9G999g999')) AS total,
SUM (TO_NUMBER(jsonbsummary->>'secondvalue','9G999g999')) AS totaltwo
FROM
mytable
WHERE
jsonbsummary ?| array['firstvalue','secondvalue']
group by ownership

user330315 · Accepted Answer · 2019-03-30 20:26:41Z

3

You need a GIN index on the JSONB column.

CREATE INDEX idx_json ON mytable USING GIN (jsoncolumn);

To check for the existence of keys, you need to use the ?| operator which can make use of that index:

select ...
from mytable
where jsoncolumn ?| array['firstvalue', 'secondvalue'];

That is the equivalent to your OR condition. If you want to find rows that contain all of those keys, use the ?& instead.

answered Mar 30, 2019 at 20:26

user330315

Sign up to request clarification or add additional context in comments.

1 Comment

jotamon Over a year ago

This is great! I had to convert my json to jsonb before this would run. I added that detail into my question

Collectives™ on Stack Overflow

How to speed up postgres query on large table

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related