slow extraction of json array elements in postgresql

Question

I have a json array and I want to expand each element to a new table. With the new json functions in postgresql 9.3, I expected this to be the best method:

create table p as select json_array_elements(json) foo from g

To my amazement, a lateral expansion is lot faster:

create table p as select json->x foo from g join lateral (select 
generate_series(0,json_array_length(g.json)-1) x ) xxx on true

Which is the problem with the first approach?

EDIT: a test case can be built for 20000 rows as

create table g as select (select json_agg(random()) json  from 
generate_series(0, (r1*4)::int))  from (select random() r1 from 
generate_series(1,20000)) aux;

Over a SSD storage, it takes 3 seconds against 0.2 seconds with lateral. For 40000 rows, the time increases to 12 seconds, while the lateral method just grows near linearly.

Yup, that's weird. If you can reproduce this over repeated runs, and you have a data set you can share (or can construct one), please post to the pgsql-perform list to mention this. — Craig Ringer
– Craig Ringer, Commented Feb 2, 2014 at 13:45
@CraigRinger I have added a test case to the question. I will wait some days before going to pgsql-perform, if only to have a more complete case. — arivero
– arivero, Commented Feb 2, 2014 at 18:14
BTW, your lateral query can be simplified to create table q as select json->x foo from g, generate_series(0,json_array_length(g.json)-1) x; — Craig Ringer
– Craig Ringer, Commented Feb 3, 2014 at 0:53

Craig Ringer · Accepted Answer · 2014-02-03 01:57:55Z

3

The test case is certainly conclusive, and perf top -p $the_backend_pid helps show why:

 96.92%  postgres      [.] MemoryContextReset
  0.15%  [kernel]      [k] cpuacct_account_field
  0.09%  [kernel]      [k] update_cfs_rq_blocked_load
  0.09%  postgres      [.] AllocSetAlloc
  0.09%  libc-2.17.so  [.] __memcpy_ssse3_back
  0.07%  postgres      [.] parse_array
  0.07%  [kernel]      [k] trigger_load_balance
  0.07%  [kernel]      [k] rcu_check_callbacks
  0.06%  [kernel]      [k] apic_timer_interrupt
  0.05%  [kernel]      [k] do_timer
  0.05%  [kernel]      [k] update_cfs_shares
  0.05%  libc-2.17.so  [.] malloc

It's spending a huge amount of time in MemoryContextReset. Especially given that the profile above was noted down at the 47 billion events (approx) mark.

Backtraces are always like:

#0  0x000000000072dd7d in MemoryContextReset (context=0x2a02dc90) at mcxt.c:130
#1  0x000000000072dd90 in MemoryContextResetChildren (context=<optimized out>) at mcxt.c:155
#2  MemoryContextReset (context=0x1651220) at mcxt.c:131
#3  0x00000000005817f9 in ExecScan (node=node@entry=0x164e1a0, accessMtd=accessMtd@entry=0x592040 <SeqNext>, recheckMtd=recheckMtd@entry=0x592030 <SeqRecheck>)
    at execScan.c:155

with varying locations within MemoryContextReset, usually at a branch.

Runtime was 836904.371, vs 903.202 for the lateral join at 200k input rows (10x your test).

So I'd say you've certainly found a performance problem that needs attention.

Update: here's a patch that will apply against git master, or against 9.3. It's pretty easy to grab the source package / srpm and rebuild it if you're using deb/rpm packages of PostgreSQL, no need to switch to unpackaged just to apply a patch.

edited Feb 3, 2014 at 1:57

answered Feb 2, 2014 at 23:45

Craig Ringer

329k83 gold badges742 silver badges820 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

arivero Over a year ago

Thanks for the confirmation. I was afraid I was going to need to submit a hundred of configuration data files to the list. BTW, no need to wait for me if you want report :-) I will visit the list anyway to see if someone can find a workaround in the configuration or some quick patch.

Craig Ringer Over a year ago

@arivero I'm posting to -hackers with a detailed report now, I'll send you the message-id. If it's quick I might be able to patch it but I'm pretty busy with other work - going to have a quick look once I've finished the report.

Craig Ringer Over a year ago

@arivero postgresql.org/message-id/[email protected] . Will take a look at the source, see if it's an easy fix now.

Craig Ringer Over a year ago

@arivero ... and it was. Patch. postgresql.org/message-id/[email protected] . If you're running RPM / deb builds, it's not overly hard to rebuild them with a patch applied - no need to switch wholly to unpackaged PostgreSQL.

arivero Over a year ago

Thanks very much! That should be enough json speed to keep avoiding "document databases" for some time :-D

|

Collectives™ on Stack Overflow

slow extraction of json array elements in postgresql

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related