4

I have a json array and I want to expand each element to a new table. With the new json functions in postgresql 9.3, I expected this to be the best method:

create table p as select json_array_elements(json) foo from g

To my amazement, a lateral expansion is lot faster:

create table p as select json->x foo from g join lateral (select 
generate_series(0,json_array_length(g.json)-1) x ) xxx on true

Which is the problem with the first approach?

EDIT: a test case can be built for 20000 rows as

create table g as select (select json_agg(random()) json  from 
generate_series(0, (r1*4)::int))  from (select random() r1 from 
generate_series(1,20000)) aux;

Over a SSD storage, it takes 3 seconds against 0.2 seconds with lateral. For 40000 rows, the time increases to 12 seconds, while the lateral method just grows near linearly.

3
  • 2
    Yup, that's weird. If you can reproduce this over repeated runs, and you have a data set you can share (or can construct one), please post to the pgsql-perform list to mention this. Commented Feb 2, 2014 at 13:45
  • @CraigRinger I have added a test case to the question. I will wait some days before going to pgsql-perform, if only to have a more complete case. Commented Feb 2, 2014 at 18:14
  • 1
    BTW, your lateral query can be simplified to create table q as select json->x foo from g, generate_series(0,json_array_length(g.json)-1) x; Commented Feb 3, 2014 at 0:53

1 Answer 1

3

The test case is certainly conclusive, and perf top -p $the_backend_pid helps show why:

 96.92%  postgres      [.] MemoryContextReset
  0.15%  [kernel]      [k] cpuacct_account_field
  0.09%  [kernel]      [k] update_cfs_rq_blocked_load
  0.09%  postgres      [.] AllocSetAlloc
  0.09%  libc-2.17.so  [.] __memcpy_ssse3_back
  0.07%  postgres      [.] parse_array
  0.07%  [kernel]      [k] trigger_load_balance
  0.07%  [kernel]      [k] rcu_check_callbacks
  0.06%  [kernel]      [k] apic_timer_interrupt
  0.05%  [kernel]      [k] do_timer
  0.05%  [kernel]      [k] update_cfs_shares
  0.05%  libc-2.17.so  [.] malloc

It's spending a huge amount of time in MemoryContextReset. Especially given that the profile above was noted down at the 47 billion events (approx) mark.

Backtraces are always like:

#0  0x000000000072dd7d in MemoryContextReset (context=0x2a02dc90) at mcxt.c:130
#1  0x000000000072dd90 in MemoryContextResetChildren (context=<optimized out>) at mcxt.c:155
#2  MemoryContextReset (context=0x1651220) at mcxt.c:131
#3  0x00000000005817f9 in ExecScan (node=node@entry=0x164e1a0, accessMtd=accessMtd@entry=0x592040 <SeqNext>, recheckMtd=recheckMtd@entry=0x592030 <SeqRecheck>)
    at execScan.c:155

with varying locations within MemoryContextReset, usually at a branch.

Runtime was 836904.371, vs 903.202 for the lateral join at 200k input rows (10x your test).

So I'd say you've certainly found a performance problem that needs attention.

Update: here's a patch that will apply against git master, or against 9.3. It's pretty easy to grab the source package / srpm and rebuild it if you're using deb/rpm packages of PostgreSQL, no need to switch to unpackaged just to apply a patch.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the confirmation. I was afraid I was going to need to submit a hundred of configuration data files to the list. BTW, no need to wait for me if you want report :-) I will visit the list anyway to see if someone can find a workaround in the configuration or some quick patch.
@arivero I'm posting to -hackers with a detailed report now, I'll send you the message-id. If it's quick I might be able to patch it but I'm pretty busy with other work - going to have a quick look once I've finished the report.
@arivero postgresql.org/message-id/[email protected] . Will take a look at the source, see if it's an easy fix now.
@arivero ... and it was. Patch. postgresql.org/message-id/[email protected] . If you're running RPM / deb builds, it's not overly hard to rebuild them with a patch applied - no need to switch wholly to unpackaged PostgreSQL.
Thanks very much! That should be enough json speed to keep avoiding "document databases" for some time :-D
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.