0

I have millions of records in this table using Amazon Aurora Postgres 10.7:

create table "somedb"."sometable"
(
    id varchar(4096) not null constraint "sometable_pkey" primary key,
    tag varchar(255) not null,
    json jsonb not null
);

Example row:

{"id": "abc", "ts": 1580879910, "data": "my stuff"}

I have these queries that take dozens of seconds:

SELECT jsonData->'data'
WHERE (jsonData->>'ts' >= '1576000473')
ORDER BY jsonData->>'ts' ASC LIMIT 100 OFFSET 50000;

I'm trying to improve performance here, and these are all the indexes that I tried, but at most I get an INDEX SCAN in the query plan at best.

create index "sometable_ts"
on "somedb"."sometable" ((jsondata -> 'ts'::text));

create index "sometable_ts-int" 
on "somedb"."sometable" using btree (((jsondata ->> 'ts')::integer));

I adjust my queries as well to: ORDER BY (jsonData->>'ts')::integer, but nothing.

Best plan:

Limit  (cost=613080.18..613149.46 rows=100 width=356) (actual time=24934.492..24937.344 rows=100 loops=1)
    ->  Index Scan using "sometable_ts-int" on "sometable"  (cost=0.43..3891408.61 rows=5616736 width=356) (actual time=0.068..24889.459 rows=885000 loops=1)
        Index Cond: (((jsondata ->> 'ts'::text))::integer >= 1576000473)
Planning time: 0.145 ms
Execution time: 24937.381 ms

Can anyone recommend a way to adjust the indexes or queries for these to become faster? Thanks!

5
  • use-the-index-luke.com Commented Feb 10, 2020 at 17:50
  • @jarlh that site is great and all, but reason why I'm asking is because this is JSON and from my attempts, typical indexes techniques didn't work for me. Commented Feb 10, 2020 at 17:52
  • 1
    @jarlh You mean this, right? Commented Feb 10, 2020 at 17:56
  • @LaurenzAlbe, exactly! Commented Feb 10, 2020 at 18:04
  • By the way, varchar(4096) is a terrible choice for a primary key columns. Very long values will make the index fail. Commented Feb 10, 2020 at 18:14

1 Answer 1

1

Using OFFSET like this will always cause bad performance.

You should use keyset pagination:

Create this index:

CREATE INDEX ON somedb.sometable (id, (jsonData->>'ts'));

Then, to paginate, your first query is:

SELECT jsonData->'data'
FROM somedb.sometable
WHERE jsonData->>'ts' >= '1576000473'
ORDER BY jsonData->>'ts', id
LIMIT 100;

Remember jsonData->>'ts' and id from the last result row you got in last_ts and last_id.

Your next page is found with

SELECT jsonData->'data'
FROM somedb.sometable
WHERE (jsonData->>'ts', id) > (last_ts, last_id)
ORDER BY jsonData->>'ts', id
LIMIT 100;

Keep going like this, and retrieving the 500th page will be as fast as retrieving the first.

Sign up to request clarification or add additional context in comments.

9 Comments

My ids are not monotonically increasing. They are uuids.
I guess I could create a column.
That is irrelevant. It only matters that they are unique.
How does this part work then? > (last_ts, last_id)
Ok thank you. The idea is that it uses the Row Values syntax/comparison for this solution, which is thankfully fully avaiable in Postgres, but few other databases. Please note whoever reads this solution in the future.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.