Javascript nested function call optimization

Question

EDIT! I changed the answer to my own after a lot of follow-up research showed that there isn't a simple answer to my question. See below!

So, in a followup to my last question, I'm trying to get a better handle on best Javascript practices to optimize performance. For the following example, I'm testing in Chrome 28.0.1500.70 using the in-browser profiler.

I've got some math functions encapsulated in an object that are getting called a few hundred k-times a second and am trying to shave a bit of the execution time.

I've already done some optimization by making local copies of the parent objects locals as locals in the called functions themselves and got a decent (~16%) performance boost. However, when I did the same for calling another function from the parent object, i got a huge (~100%) performance increase.

The original setup was calcNeighbors calling fellow parent object function cirInd via this.cirInd.

Making a local var copy of cirInd and calling that instead gave a huge performance gain, less than half the execution time as before for calcNeighbors.

However, making cirInd an inline function in calcNeighbors caused a return to the same slower performance as calling it from the parent object.

I'm really perplexed by this. I suppose that it could be a quirk in Chrome's profiler (cirInd doesn't show up at all in the second case) but there is definitely a noticeable performance gain in the application when I use case 2.

Can someone explain why case 2 is so much faster than case 1 but more importantly, why case 3 seems to not give any performance gain?

The functions in question are here:

calling from parent object:

  window.bgVars = {
     <snip>
     "cirInd": function(index, mod){
        //returns modulus, array-wrapping value to implement circular array
        if(index<0){index+=mod;}
        return index%mod;
     },
     "calcNeighbors": function(rep){
        var foo = this.xBlocks;
        var grid = this.cGrid;
        var mod = grid.length;
        var cirInd = this.cirInd;
        var neighbors = grid[this.cirInd(rep-foo-1, mod)] + grid[this.cirInd(rep-foo, mod)] + grid[this.cirInd(rep-foo+1, mod)] + grid[this.cirInd(rep-1, mod)] + grid[this.cirInd(rep+1, mod)] + grid[this.cirInd(rep+foo-1, mod)] + grid[this.cirInd(rep+foo, mod)] + grid[this.cirInd(rep+foo+1, mod)];
        return neighbors;
     },
     <snip>
  }

calling via local variable:

  window.bgVars = {
     <snip>
     "cirInd": function(index, mod){
        //returns modulus, array-wrapping value to implement circular array
        if(index<0){index+=mod;}
        return index%mod;
     },
     "calcNeighbors": function(rep){
        var foo = this.xBlocks;
        var grid = this.cGrid;
        var mod = grid.length;
        var cirInd = this.cirInd;
        var neighbors = grid[cirInd(rep-foo-1, mod)] + grid[cirInd(rep-foo, mod)] + grid[cirInd(rep-foo+1, mod)] + grid[cirInd(rep-1, mod)] + grid[cirInd(rep+1, mod)] + grid[cirInd(rep+foo-1, mod)] + grid[cirInd(rep+foo, mod)] + grid[cirInd(rep+foo+1, mod)];
        return neighbors;
     },
     <snip>
  }

calling inline:

  window.bgVars = {
     <snip>
     "calcNeighbors": function(rep){
        var foo = this.xBlocks;
        var grid = this.cGrid;
        var mod = grid.length;
        function cirInd(index, mod){
          //returns modulus, array-wrapping value to implement circular array
          if(index<0){index+=mod;}
          return index%mod;
        }
        var neighbors = grid[cirInd(rep-foo-1, mod)] + grid[cirInd(rep-foo, mod)] + grid[cirInd(rep-foo+1, mod)] + grid[cirInd(rep-1, mod)] + grid[cirInd(rep+1, mod)] + grid[cirInd(rep+foo-1, mod)] + grid[cirInd(rep+foo, mod)] + grid[cirInd(rep+foo+1, mod)];
        return neighbors;
     },
     <snip>
  }

"why case 3 seems to not give any performance gain" --- why should moving a function to the same scope make some performance gain? — zerkms
– zerkms, Commented Jul 11, 2013 at 2:41
Not 100% relevant but still useful: developers.google.com/v8/design#prop_access — zerkms
– zerkms, Commented Jul 11, 2013 at 2:44
#3 generates a new function cirInd() each call, while #2 recycles the same one each call. fewer activation creations = faster runtime and less trash to clean up. — dandavis
– dandavis, Commented Jul 11, 2013 at 2:44

dandavis · Accepted Answer · 2013-07-11 05:13:21Z

2

perhaps seeing #2 and #3 in a simplified view will help illustrate the object creation side-effects.

i believe this should make it obvious:

alls1=[];
alls2=[];

function inner1(){}
function outer1(){
     if(alls1.indexOf(inner1)===-1){ alls1.push(inner1); }
}


function outer2(){
   function inner2(){}
   if(alls2.indexOf(inner2)===-1){ alls2.push(inner2); }
}

for(i=0;i<10;i++){
   outer1();
   outer2();
}

alert([ alls1.length, alls2.length  ]); // shows: 1, 10

functions are objects, and making new objects is never free.

EDIT: expanding on #1 vs #2

again, the a simplified example will help illustrate:

function y(a,b){return a+b;}
var out={y:y};
var ob={
   y:y, 
   x1: function(a){ return this.y(i,a);},
   x2: function(a){ return y(i,a);},
   x3: function(a){ return out.y(i,a);}
}

var mx=999999, times=[], d2,d3,d1=+new Date;
for(var i=0;i<mx;i++){ ob.x1(-i) }
times.push( (d2=+new Date)-d1 );

for(var i=0;i<mx;i++){ ob.x2(-i) }
times.push( (d3=+new Date)-d2 );

for(var i=0;i<mx;i++){ ob.x3(-i) }
times.push( (+new Date)-d3 );

alert(times); // my chrome's typical: [ 1000, 1149, 1151 ]

understand that there is more noise in a simple example, and closure is a big chunk of the overhead in all3, but the diffs between them is what's important.

in this demo you won't see the huge gain observed in your dynamic system, but you do see how close y and out.y profile compared to this.y, all else being equal.

the main point is that it's not the extra dot resolution per se that slows things down, as some have alluded to, it's specifically the "this" keyword in V8 that matters, otherwise out.y() would profile closer to this.y()...

firefox is a different story.

tracing allows this.whatever to be predicted, so all three profile within a bad dice roll of each other, on the same comp as chrome: [2548, 2532, 2545]...

edited Jul 11, 2013 at 5:13

answered Jul 11, 2013 at 2:59

dandavis

16.8k5 gold badges43 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

DanHeidel Over a year ago

Derp, you're right, i don't know why I didn't spot that in #3. I'm still a bit confused why #1 takes such a huge performance hit though.

dandavis Over a year ago

@DanHeidel: i would offer that the cirInd being invoked in #2 has no internal this binding, whereas it's the whole object in #1. the whole path token would normally be shortcut'd by the JIT. But, using "this." means that the holder object has to be re-assessed at execution time. all else being equal, using (almost) any single identifier to the left of the dot besides "this" should profile about the same a lexical named function ref.

DanHeidel Over a year ago

Interesting. So if I'm following you correctly, a reference to bgVars.cirInd rather than this.cirInd would allow the JIT to avoid the runtime lookup hit?

dandavis Over a year ago

i am not a low-level expert, but my understanding is this x.y() would profile more like y() than this.y(), if y were the same on all three. i'll edit the answer to elaborate.

DanHeidel Over a year ago

So, I tried the explicit reference vs using this and in Chrome, it gave no performance increase - it looked indistinguishable from #1. Even stranger, I set up a jsPerf to test all 4 cases and got completely different results. In that case, #1-3 are almost identical and #4 is over 10x slower. jsperf.com/different-nested-js-function-calls

|

Travis J · Accepted Answer · 2013-07-11 02:43:48Z

1

The reason there is relatively extra time involved in number 1 should be obvious. You access the entire object scope, and then have to find a property.

Number 2 and 3 are both pointers to a function, so there is no seeking.

A very good resource for testing these types of situations is jsPerf, and I would highly recommend recreating the scenario there and running the test to see the exact differences and whether or not they are significant to you.

answered Jul 11, 2013 at 2:43

Travis J

82.5k43 gold badges213 silver badges280 bronze badges

11 Comments

zerkms Over a year ago

O(n)? <s>I'm sure it's a hash table not a list</s> stackoverflow.com/a/6602088/251311

Travis J Over a year ago

@zerkms - So where does the time for accessing a property come from if it is a hash? I removed the time complexity assumption from my answer.

zerkms Over a year ago

see the comment update. Even for a hash-alike structure it's still more expensive than O(1)

Travis J Over a year ago

@zerkms - So for the hash, the calculation creates a key, and there is an iteration of the set of properties checking to see if the current key was the key generated for the hash. Once found, the value at the key's index is returned. Does that sound right to you? If so, how is that not O(n) where n is the number of keys in the hash? Worst case, every key is checked.

zerkms Over a year ago

for the string keys the lookup cost of a proper hash table implementation is close to O(1) (which is the ideal). O(n) is the worst case when all your keys fell into the same bucket (not possible for V8 implementation). Some useful reading: lemire.me/blog/archives/2009/08/18/…

|

DanHeidel · Accepted Answer · 2013-07-23 04:36:39Z

OK, I've been researching this issue for a while now and TL;DR - it's complicated.

Turns out that many performance questions really depend on the platform, browser and even minor browser revision number. And not by a little, either. There are many examples on jsPerf that show things such as 'for vs while; or 'typed arrays vs standard arrays' wildly swinging back and forth in terms of favorable execution speed with different browser releases. Presumably this is due to JIT optimization trade-offs.

Short answer to the general performance questions - just test everything in jsPerf. None of the suggestions I got in this thread were helpful in all cases. The JIT makes things complicated. This is particularly important if you have a background like mine and are used to C programs having certain rule-of-thumb coding patterns that tend to speed things up. Don't assume anything - just test it.

NOTE: many of the weird issues I listed in the original question weer due to using the default Chrome profiler. (e.g.: the profiler you get from the Ctl+Shift+I menu) If you are doing a lot of really fast loops (such as in graphics rendering), DO NOT USE THIS PROFILER. It has a time resolution of 1 ms which is much too coarse to do proper performance debugging.

In fact the ENTIRE issue I had with case 2 being so much faster than the others is entirely due to the profiler simply not 'seeing' many of the function calls and improperly reporting CPU percentages. Int he heat map, I could clearly see huge stretches where inner loop functions were firing but not being recorded by the profiler.

Solution: http://www.html5rocks.com/en/tutorials/games/abouttracing/# Chrome has a less-obvious and much more powerful profiler built into about:tracing. It's got microsecond resolution, the ability o read code tags for sub-function resolution and is generally much more kickass. As soon as I started using this profiler, the results fell into line with what I saw on jsPerf and helped me reduce my rendering time by nearly half. How did I do that? Again, it wasn't simple. In some cases, calling out to subroutines helped, in others it didn't. Refactoring the whole rendering engine from an object literal to module pattern seemed to help a bit. Precalcing any multiplication operations in for loops did seem to have big effects. etc, etc.

Quick notes about the about:tracing profiler: Zooming and panning is with ASWD on the keyboard - that took me a while to figure out. Also, it profiles all tabs and operates in a tab outside the page being analyzed. So minimize the number of extraneous tabs you have open since they will clutter up the profiler view. Also, if testing Canvas apps, be sure to switch tabs over to the app since RequestAnimationFrame generally doesn't fire when a tab is not active and visible.

Collectives™ on Stack Overflow

Javascript nested function call optimization

3 Answers 3

13 Comments

11 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

13 Comments

11 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related