0

I wrote mergesort() in C++ for linked lists. The issue is that my professor has provided test code with a very large list (length of 575,000). This causes a stack overflow error for my function since it is written recursively.

So it's possible my professor expects us to write it using iterations instead of recursion. I wanted to ask if there is anything wrong with my code that may be causing the stack to overflow?

My code:

typedef struct listnode {
    struct listnode * next;
    long value; 
} LNode;

LNode* mergesort(LNode* data) {
    if(data == NULL || data->next == NULL) {
        return data;
    }else {
        LNode* s = split(data);

        LNode* firstSortedHalf = mergesort(data);
        LNode* secondSortedHalf = mergesort(s);

        LNode* r = merge(firstSortedHalf, secondSortedHalf);
        return r;
    }
}

LNode* split(LNode* list) {
    if(list) {
        LNode* out = list->next;

        if(out) {
            list->next = out->next;
            out->next = split(out->next);
        }
        return out;
    }else {
        return NULL;
    }
}

LNode* merge(LNode* a, LNode* b) {
    if(a == NULL)
        return b;
    else if(b == NULL)
        return a;

    if(a->value < b->value) {
        a->next = merge(a->next,b);
        return a;
    }else {
        b->next = merge(a, b->next);
        return b;
    }
}
7
  • @BillLynch Well I had to make mergesort() for linked lists, so recursion only felt most natural. Commented Feb 22, 2015 at 21:10
  • There is a much faster approach for lists that uses an array of pointers to nodes (plus temp pointers). The array pointers are initialized to NULL, and then nodes are merged into the array one node at a time. Each pointer in the array is either NULL or it points to the first node of a list with 2^(index of array) nodes. array[0] = 1 node, array[1] = 2 nodes, array[2] = 4 nodes, ... . The last pointer points to a list of unlimited size. An array size of 24 to 32 would be enough. After the nodes are merged into the array, then the array is merged to form a single sorted list. Commented Feb 22, 2015 at 22:46
  • I'm not sure if the array approach would be allowed in a class room situation. Commented Feb 22, 2015 at 22:47
  • @rcgldr: this approach looks like a simulation of recursion using an explicit stack. This brings no benefit in this context and complicates the coding. Commented Jan 3, 2017 at 0:09
  • @YvesDaoust - The "array" of pointers to nodes approach, which is a bottom up merge, eliminates the need to repeatedly split the array by scanning of lists. The actual code is simpler and faster. Wiki pseudo-code example , working c example . Commented Jan 3, 2017 at 1:22

2 Answers 2

3

So you have three recursive functions. Let's look at the maximum depth of each with the worst case of a list of 575000 elements:

  • merge(): This looks to iterate over the entire list. So 575000 stack frames.
  • split(): This looks to iterate over the entire list in pairs. So ~250000 stack frames.
  • mergesort(): This looks to iterate in a splitting fashion. So log_2(575000) or about 20 stack frames.

So, when we run our programs, we're given a limited amount of stack space to fit all of our stack frames. On my computer, the default limit is about 10 megabytes.

A rough estimate would be that each of your stack frames takes up 32 bytes. For the case of merge(), this means that it would take up about 18 megabytes of space, which is well beyond our limit.

The mergesort() call itself though, is only 20 iterations. That should fit under any reasonable limit.

Therefore, my takeaway is that merge() and split() should not be implemented in a recursive manner (unless that manner is tail recursive and optimizations are on).

Sign up to request clarification or add additional context in comments.

4 Comments

I looked around and it seems g++ does not do tail call optimization. If I were to rewrite this in C and compile it using GCC, would I be able to keep the functions recursive, or would I have to rewrite them anyway?
Both gcc and g++ implement tail recursive optimizations. None of your functions are tail recursive.
merge() should be iterative, a function that merges two already sorted lists and returns a pointer to the first node of the merged list. split() should probably be implemented directly in mergesort(), using iteration to find the mid point of lists.
@aanrv: BillLynch is right and, sorry to be direct, implementing the merge and the split this recursive way is insane. You can do with recursion only in mergesort (hence not exceeding 20 stack frames).
0

A bit late, but it's the recursive merge() that is causing stack overflow. The recursive split() is not an issue, because its maximum depth is log2(n).

So only merge() needs to be converted to iteration.

As commented a long time ago, a bottom up approach using a small (25 to 32) array of pointers is simpler and faster, but I wasn't sure it this would be an issue with getting too much help for the assignment. Link to wiki pseudo-code:

http://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists

Link to working C example:

http://code.geeksforgeeks.org/Mcr1Bf

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.