Ideal Postgres Index For Json Data With Integer Timestamp

Question

I have millions of records in this table using Amazon Aurora Postgres 10.7:

create table "somedb"."sometable"
(
    id varchar(4096) not null constraint "sometable_pkey" primary key,
    tag varchar(255) not null,
    json jsonb not null
);

Example row:

{"id": "abc", "ts": 1580879910, "data": "my stuff"}

I have these queries that take dozens of seconds:

SELECT jsonData->'data'
WHERE (jsonData->>'ts' >= '1576000473')
ORDER BY jsonData->>'ts' ASC LIMIT 100 OFFSET 50000;

I'm trying to improve performance here, and these are all the indexes that I tried, but at most I get an INDEX SCAN in the query plan at best.

create index "sometable_ts"
on "somedb"."sometable" ((jsondata -> 'ts'::text));

create index "sometable_ts-int" 
on "somedb"."sometable" using btree (((jsondata ->> 'ts')::integer));

I adjust my queries as well to: ORDER BY (jsonData->>'ts')::integer, but nothing.

Best plan:

Limit  (cost=613080.18..613149.46 rows=100 width=356) (actual time=24934.492..24937.344 rows=100 loops=1)
    ->  Index Scan using "sometable_ts-int" on "sometable"  (cost=0.43..3891408.61 rows=5616736 width=356) (actual time=0.068..24889.459 rows=885000 loops=1)
        Index Cond: (((jsondata ->> 'ts'::text))::integer >= 1576000473)
Planning time: 0.145 ms
Execution time: 24937.381 ms

Can anyone recommend a way to adjust the indexes or queries for these to become faster? Thanks!

@jarlh that site is great and all, but reason why I'm asking is because this is JSON and from my attempts, typical indexes techniques didn't work for me. — jn1kk
– jn1kk, Commented Feb 10, 2020 at 17:52
By the way, varchar(4096) is a terrible choice for a primary key columns. Very long values will make the index fail. — Laurenz Albe
– Laurenz Albe, Commented Feb 10, 2020 at 18:14

Laurenz Albe · Accepted Answer · 2020-02-10 18:04:42Z

1

Using OFFSET like this will always cause bad performance.

You should use keyset pagination:

Create this index:

CREATE INDEX ON somedb.sometable (id, (jsonData->>'ts'));

Then, to paginate, your first query is:

SELECT jsonData->'data'
FROM somedb.sometable
WHERE jsonData->>'ts' >= '1576000473'
ORDER BY jsonData->>'ts', id
LIMIT 100;

Remember jsonData->>'ts' and id from the last result row you got in last_ts and last_id.

Your next page is found with

SELECT jsonData->'data'
FROM somedb.sometable
WHERE (jsonData->>'ts', id) > (last_ts, last_id)
ORDER BY jsonData->>'ts', id
LIMIT 100;

Keep going like this, and retrieving the 500th page will be as fast as retrieving the first.

answered Feb 10, 2020 at 18:04

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

jn1kk Over a year ago

My ids are not monotonically increasing. They are uuids.

jn1kk Over a year ago

I guess I could create a column.

Laurenz Albe Over a year ago

That is irrelevant. It only matters that they are unique.

jn1kk Over a year ago

How does this part work then? > (last_ts, last_id)

jn1kk Over a year ago

Ok thank you. The idea is that it uses the Row Values syntax/comparison for this solution, which is thankfully fully avaiable in Postgres, but few other databases. Please note whoever reads this solution in the future.

|

Collectives™ on Stack Overflow

Ideal Postgres Index For Json Data With Integer Timestamp

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related