2

My environment is NodeJS, although this could be a web related problem as well. I have a large set of data from a database which I am attempting to enumerate over. However, for the sake of argument lets say that I have an array of 20,000 strings:

var y = 'strstrstrstrstrstrstrstrstrstr';
var x = [];
for(var i = 0; i < 20000; i++)
  x.push(y);

and I want to enumerate this list asynchronously, lets say using the async library, and lets say because I'm super cautious that I even limit my enumeration to 5 iterations at once:

var allDone = function() { console.log('done!') };
require('async').eachLimit(x, 5, function(item, cb){
  ...
  someAsyncCall(.., cb);
}, allDone);

The expectation is that 5 items of x would be iterated concurrently above and that eventually all 20,000 items would be iterated over and the console would print 'done!'. What actually happens is:

Uncaught exception: [RangeError: Maximum call stack size exceeded]

And at this point I assumed that this must be some sort of bug with the async library, so I wrote my own version of eachLimit which follows:

function eachLimit(data, limit, iterator, cb) {
    var consumed = 0;
    var consume;
    var finished = false;
    consume = function() {
        if(!finished && consumed >= data.length) {
            finished = true;
            cb();
        }else if(!finished) {
            return iterator(data[consumed++], consume);
        }
    };
    var concurrent = limit > data.length ? data.length : limit;
    for(var i = 0; i < concurrent; i++)
        consume();
}

and interestingly enough, this solved my problem. But then when I moved my experiment from nodeJS over to Chrome, even with my solution above I still receive a stack size exceeded.

Clearly, my method does not increase the stack as large as the eachLimit method contained within async. However, I still consider my approach to be bad because maybe not for 20k items, but for some sized array I can still exceed the stack size using my method. I feel like I need to design some sort of solution to this problem using tail recursion, but I'm not sure if v8 will even optimize for this case, or if it's possible given the problem.

10
  • What do you feel you are gaining by making this async? The reason you are running into issues is because async is designed to make working with already async code easier, it doesn't make synchronous things async. At the end of the day, you'll still need to iterate overall the items in the main thread, so why split that up? Commented Feb 22, 2015 at 7:47
  • The repetitious task I am doing is asynchronous. See the line: /* do some async stuff like db queries and eventually call cb() */ Commented Feb 22, 2015 at 14:08
  • The trivial explanation is that your someAsyncCall is actually not asynchronous (or not always asynchronous). Show us what exactly you are doing so that we can reproduce it. Commented Feb 22, 2015 at 14:20
  • Notice that your own eachLimit function has a bug, it might call cb even when not all iterator(…) callbacks have been called yet. Commented Feb 22, 2015 at 14:24
  • Bergi even if I just call cb() directly or do setTimeout(cb, 1), the problem persists. In your trivial scenario you have to consider that the stack will grow to a massive size after enough iteration regardless as to the async task, right? Also, you're right about the behaviour of my owneachLimit function, good point! Commented Feb 22, 2015 at 14:27

3 Answers 3

2

I feel like I need to design some sort of solution to this problem using tail recursion, but I'm not sure if v8 will even optimize for this case, or if it's possible given the problem.

The continuation-passing-style you are using is already tail recursive (or close to anyway). The problem is that most JS engines really tend to do stackoverflows in these sorts of situations.

There are two main ways to work around this issue:

1) Force the code to be async using setTimeout.

What is happening with your code is that you are calling the return callbacks before the original function returns. In some async libraries this will end up resulting in stackoverflow. One simple workaround is to force the callback to run only in the next iteration of the event handling loop, by wrapping it inside a setTimeout. Translate

//Turns out this was actually "someSyncCall"...
someAsyncCall(.., cb);

into

someAsyncCall(..., function(){
    setTimeout(cb, 0)
});

The main advantage here is that this is very simple to do. The disadvantage is that this add some latency to your loop because setTimeout is implemented so that there will always be some nonzero delay to the callback (even if you set it to zero). On the server you can use nextTick (or somethign like that, forgot the precise name) to do something similar as well.

That said, its already a bit weird to have a large loop of sequential async operations. If your operations are all actually async then its going to take years to complete due to the network latency.

2) Use trampolining to handle the sync code.

The only way to 100% avoid a stackoverflow is to use bona-fide while loops. With promises this would be a bit easier to write the pseudocode for:

//vastly incomplete pseudocode
function loopStartingFrom(array, i){
    for(;i<array.length; i++){
        var x = run_next_item(i);
        if(is_promise(x)){
            return x.then(function(){
                loopStartingFrom(array, i+1)
            });
        }
    }
}

Basically, you run your loop in an actual loop, with some way to detect if one of your iterations is returning immediately or deferring to an async computation. When things return immediately you keep the loop running and when you finally get a real async result you stop the loop and resume it when the async iteration result completes.

The downside of using trampolining is that its a bit more complicated. That said, there are some async libraries out there that guarantee that stackoverflow does not occur (by using one of the two tricks I mentioned under the hood).

Sign up to request clarification or add additional context in comments.

3 Comments

uh, in fact your "trampolining" solution is overflowing the stack if run_next_item continuously returns promises.
The idea is that run_next_item returns a promise if its a real async result and returns a regular value if not. That said, maybe it would be cleaner if I had writen the test to check if the promise has already resolved or not... I did warn that this was dirty pseudocode though :)
No, I mean the fact that you're not passing a callback to then.
2

To prevent a stack overflow, you need to avoid that consume recurses into itself. You can do that using a simple flag:

function eachLimit(data, limit, iterator, cb) {
    var consumed = 0,
        running = 0,
        isAsync = true;
    function consume() {
        running--;
        if (!isAsync)
            return;
        while (running < limit && consumed < data.length) {
            isAsync = false;
            running++;
            iterator(data[consumed++], consume);
            isAsync = true;
        }
        if (running == 0)
            cb();
    }
    running++;
    consume();
}

Comments

1

Have you considered using promises for this? They should resolve the issue of an ever-increasing stack (and also you get to use promises, which is a big plus in my book):

// Here, iterator() should take a single data value as input and return
// a promise for the asynchronous behavior (if it is asynchronous)
// or any value if it is synchronous
function eachLimit(data, limit, iterator) {
    return Promise(function (resolve, reject) {
        var i = 0;
        var failed = false;

        function handleFailure(error) {
            failed = true;
            reject(error);
        }

        function queueAction() {
            try {
                Promise.when(iterator(data[i]))
                .then(handleSuccess, handleFailure);
            } catch (error) {
                reject(error);
            }
        }

        function handleSuccess() {
            if (!failed) {
                if (i < data.length) {
                    queueAction();
                    i += 1;
                } else {
                    resolve();
                }
            }
        }

        for (; i < data.length && i < limit; i += 1) {
            queueAction();
        }
    });
}

3 Comments

There are promise implementations that overflow the stack as well, the concept of promises alone does not solve this problem.
@Bergi True, but as I understand it, most of the good ones (Promises/A+ compliant) avoid stack overflows. Would I be accurate in saying that much?
"the good ones", yes, but I don't think all Promises/A+ compliant ones are good.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.