Merge sort causing stack to overflow?

Question

I wrote mergesort() in C++ for linked lists. The issue is that my professor has provided test code with a very large list (length of 575,000). This causes a stack overflow error for my function since it is written recursively.

So it's possible my professor expects us to write it using iterations instead of recursion. I wanted to ask if there is anything wrong with my code that may be causing the stack to overflow?

My code:

typedef struct listnode {
    struct listnode * next;
    long value; 
} LNode;

LNode* mergesort(LNode* data) {
    if(data == NULL || data->next == NULL) {
        return data;
    }else {
        LNode* s = split(data);

        LNode* firstSortedHalf = mergesort(data);
        LNode* secondSortedHalf = mergesort(s);

        LNode* r = merge(firstSortedHalf, secondSortedHalf);
        return r;
    }
}

LNode* split(LNode* list) {
    if(list) {
        LNode* out = list->next;

        if(out) {
            list->next = out->next;
            out->next = split(out->next);
        }
        return out;
    }else {
        return NULL;
    }
}

LNode* merge(LNode* a, LNode* b) {
    if(a == NULL)
        return b;
    else if(b == NULL)
        return a;

    if(a->value < b->value) {
        a->next = merge(a->next,b);
        return a;
    }else {
        b->next = merge(a, b->next);
        return b;
    }
}

@BillLynch Well I had to make mergesort() for linked lists, so recursion only felt most natural. — aanrv
– aanrv, Commented Feb 22, 2015 at 21:10
There is a much faster approach for lists that uses an array of pointers to nodes (plus temp pointers). The array pointers are initialized to NULL, and then nodes are merged into the array one node at a time. Each pointer in the array is either NULL or it points to the first node of a list with 2^(index of array) nodes. array[0] = 1 node, array[1] = 2 nodes, array[2] = 4 nodes, ... . The last pointer points to a list of unlimited size. An array size of 24 to 32 would be enough. After the nodes are merged into the array, then the array is merged to form a single sorted list. — rcgldr
– rcgldr, Commented Feb 22, 2015 at 22:46
I'm not sure if the array approach would be allowed in a class room situation. — rcgldr
– rcgldr, Commented Feb 22, 2015 at 22:47
@rcgldr: this approach looks like a simulation of recursion using an explicit stack. This brings no benefit in this context and complicates the coding. — user1196549
– user1196549, Commented Jan 3, 2017 at 0:09
@YvesDaoust - The "array" of pointers to nodes approach, which is a bottom up merge, eliminates the need to repeatedly split the array by scanning of lists. The actual code is simpler and faster. Wiki pseudo-code example , working c example . — rcgldr
– rcgldr, Commented Jan 3, 2017 at 1:22

Bill Lynch · Accepted Answer · 2015-02-22 21:20:34Z

3

So you have three recursive functions. Let's look at the maximum depth of each with the worst case of a list of 575000 elements:

merge(): This looks to iterate over the entire list. So 575000 stack frames.
split(): This looks to iterate over the entire list in pairs. So ~250000 stack frames.
mergesort(): This looks to iterate in a splitting fashion. So log_2(575000) or about 20 stack frames.

So, when we run our programs, we're given a limited amount of stack space to fit all of our stack frames. On my computer, the default limit is about 10 megabytes.

A rough estimate would be that each of your stack frames takes up 32 bytes. For the case of merge(), this means that it would take up about 18 megabytes of space, which is well beyond our limit.

The mergesort() call itself though, is only 20 iterations. That should fit under any reasonable limit.

Therefore, my takeaway is that merge() and split() should not be implemented in a recursive manner (unless that manner is tail recursive and optimizations are on).

edited Feb 22, 2015 at 21:20

answered Feb 22, 2015 at 21:11

Bill Lynch

82.3k16 gold badges132 silver badges181 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

aanrv Over a year ago

I looked around and it seems g++ does not do tail call optimization. If I were to rewrite this in C and compile it using GCC, would I be able to keep the functions recursive, or would I have to rewrite them anyway?

Bill Lynch Over a year ago

Both gcc and g++ implement tail recursive optimizations. None of your functions are tail recursive.

rcgldr Over a year ago

merge() should be iterative, a function that merges two already sorted lists and returns a pointer to the first node of the merged list. split() should probably be implemented directly in mergesort(), using iteration to find the mid point of lists.

user1196549 Over a year ago

@aanrv: BillLynch is right and, sorry to be direct, implementing the merge and the split this recursive way is insane. You can do with recursion only in mergesort (hence not exceeding 20 stack frames).

rcgldr · Accepted Answer · 2017-01-03 01:29:48Z

0

A bit late, but it's the recursive merge() that is causing stack overflow. The recursive split() is not an issue, because its maximum depth is log2(n).

So only merge() needs to be converted to iteration.

As commented a long time ago, a bottom up approach using a small (25 to 32) array of pointers is simpler and faster, but I wasn't sure it this would be an issue with getting too much help for the assignment. Link to wiki pseudo-code:

http://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists

Link to working C example:

http://code.geeksforgeeks.org/Mcr1Bf

answered Jan 3, 2017 at 1:29

rcgldr

29.4k3 gold badges41 silver badges70 bronze badges

Collectives™ on Stack Overflow

Merge sort causing stack to overflow?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related