4

I'm doing this basic dp (Dynamic Programming) problem on trees (https://cses.fi/problemset/task/1674/). Given the structure of a company (hierarchy is a tree), the task is to calculate for each employee the number of their subordinates.

This:

import sys
from functools import lru_cache  # noqa
sys.setrecursionlimit(2 * 10 ** 9)

if __name__ == "__main__":
    n: int = 200000
    boss: list[int] = list(range(1, 200001))  
    # so in my example it will be a tree with every parent having one child
    graph: list[list[int]] = [[] for _ in range(n)]

    for i in range(n-1):
        graph[boss[i] - 1].append(i+1)  # directed so neighbours of a node are only its children

    @lru_cache(None)
    def dfs(v: int) -> int:
        if len(graph[v]) == 0:
            return 0
        else:
            s: int = 0
            for u in graph[v]:
                s += dfs(u) + 1
            return s

    dfs(0)

    print(*(dfs(i) for i in range(n)))

crashes (I googled the error message and it means stack overflow)

Process finished with exit code -1073741571 (0xC00000FD)

HOWEVER

import sys
sys.setrecursionlimit(2 * 10 ** 9)

if __name__ == "__main__":
    n: int = 200000
    boss: list[int] = list(range(1, 200001))
    # so in my example it will be a tree with every parent having one child
    graph: list[list[int]] = [[] for _ in range(n)]

    for i in range(n-1):
        graph[boss[i] - 1].append(i+1)  # directed so neighbours of a node are only its children

    dp: list[int] = [0 for _ in range(n)]
    def dfs(v: int) -> None:
        if len(graph[v]) == 0:
            dp[v] = 0
        else:
            for u in graph[v]:
                dfs(u)
                dp[v] += dp[u] + 1

    dfs(0)

    print(*dp)

doesn't and it's exactly the same complexity right? The dfs goes exactly as deep in both situations too? I tried to make the two pieces of code as similar as I could.

I tried 20000000 instead of 200000 (i.e. graph 100 times deeper) and it still doesn't stackoverflow for the second option. Obviously I could do an iterative version of it but I'm trying to understand the underlying reason why there are such a big difference between those two recursive options so that I can learn more about Python and its underlying functionning.

I'm using Python 3.11.1.

12
  • I don’t understand @chepner. Using dynamic programming I compute F(1000) by doing F(999) which uses F(998) etc… in my code. Commented Feb 24, 2024 at 22:14
  • DP would look more like f = {} (using a dict just to simplify code in the comment), f[0] = 1; f[1] = 1; for i in range(2, 1000): f[i] = f[i-1] + f[i-2]. That's not recursion; that's using a lookup table that you fill in "just in time", as it were. The stack never grows, unlike a call to def F(n): return 1 if n == 0 else 1 if n == 1 else F(n-1) + F(n-2), which uses O(n) stack space whether or not you use a cache to reduce the complexity from O(2**N) to O(n). Commented Feb 24, 2024 at 23:25
  • 1
    @chepner You seem to be saying that only bottom-up DP is DP, but top-down DP (memoization) is DP, too. (I'm not sure what to call the question's second program. It's a rather odd mix. But it suffices to demonstrate what the question is about.) Commented Feb 25, 2024 at 0:21
  • The recursion limit is there specifically to prevent a stack overflow. You disabled that protection by setting a ridiculously high recursion limit, then doing deep recursion. Why?!? Commented Feb 25, 2024 at 0:44
  • 1
    @ShadowRanger That's specifically what setrecursionlimit is for. Quote from its doc: "A user may need to set the limit higher when they have a program that requires deep recursion and a platform that supports a higher limit." Commented Feb 25, 2024 at 1:07

1 Answer 1

7
+50

lru_cache is implemented in C, its calls are interleaved with your function's calls, and your C code recursion is too deep and crashes. Your second program only has deep Python code recursion, not deep C code recursion, avoiding the issue.

In Python 3.11 I get a similar bad crash:

[Execution complete with exit code -11]

In Python 3.12 I just get an error:

Traceback (most recent call last):
  File "/ATO/code", line 34, in <module>
    dfs(0)
  File "/ATO/code", line 31, in dfs
    s += dfs(u) + 1
         ^^^^^^
  File "/ATO/code", line 31, in dfs
    s += dfs(u) + 1
         ^^^^^^
  File "/ATO/code", line 31, in dfs
    s += dfs(u) + 1
         ^^^^^^
  [Previous line repeated 496 more times]
RecursionError: maximum recursion depth exceeded

That's despite your sys.setrecursionlimit(2 * 10 ** 9).

What’s New In Python 3.12 explains:

sys.setrecursionlimit() and sys.getrecursionlimit(). The recursion limit now applies only to Python code. Builtin functions do not use the recursion limit, but are protected by a different mechanism that prevents recursion from causing a virtual machine crash

So in 3.11, your huge limit is applied to C recursion as well, Python obediently attempts your deep recursion, its C stack overflows, and the program crashes. Whereas in 3.12 the limit doesn't apply to C recursion, Python protects itself with that different mechanism at a relatively shallow recursion depth, producing that error instead.

Let's avoid that C recursion. If I use a (simplified) Python version of lru_cache, your first program works fine in both 3.11 and 3.12 without any other change:

def lru_cache(_):
    def deco(f):
        memo = {}
        def wrap(x):
            if x not in memo:
                memo[x] = f(x)
            return memo[x]
        return wrap
    return deco

See CPython's GitHub Issue 3.12 setrecursionlimit is ignored in connection with @functools.cache for more details and the current progress on this. There are efforts to increase the C recursion limit, but it looks like it'll remain only in the thousands. Not enough for your super deep recursion.

Miscellaneous further information I might expand on later:

If you want the Python version of the regular lru_cache, you could try the hack of importing and deleting the C version of its wrapper before you import the Python version (Technique from this answer, which did that with heapq):

import _functools
del _functools._lru_cache_wrapper
from functools import lru_cache

Or you could remove the import that replaces the Python implementation with the C implementation. But that's an even worse hack and I wouldn't do that. Maybe I would copy&rename the functools module and modify the copy. But I mostly mention these hacks to illustrate what's happening, or as a last resort.

Sign up to request clarification or add additional context in comments.

8 Comments

So the C stack overflows, but the Python stack doesn’t? Even though traditionally C has better memory management?
C doesn't have "better" memory management; it has manual memory management. But this isn't about memory management, but stack growth. Python monitors the stack and raises an exception if the stack gets "too big"; C simply lets the stack grow until you run out of memory for it.
@FluidMechanicsPotentialFlows Yes. The C stack limit appears to usually be ~8 MB nowadays and to overflow with C recursion depth in the thousands or tens of thousands. Whereas pure Python recursion since Python 3.11 now mostly doesn't involve the C stack and is only limited by the limit you can control with sys.setrecursionlimit and by how much (heap) memory you have available. Which can be much higher. If you have Linux, you might be able to increase your C stack with ulimit -s sizeInKB.
@FluidMechanicsPotentialFlows See the added "Miscellaneous" section for more references/explanation.
@FluidMechanicsPotentialFlows Yes. Python's devguide talks about The call stack of its interpreter: up to Python 3.10, "each call would require a heap allocation for the stack frame" (though with some reuse optimization). It also describes the new mechanism, but isn't quite as clear about that, but it surely still uses the heap. That's slower than a single fixed preallocated stack like C's would be, but as Python is relatively slow anyway, the extra cost of using the heap is much less significant than it would be for C.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.