Adding a single variable to the function signature makes code significantly slower

Question

This pertains to Emily's answer here: https://stackoverflow.com/a/13850560/2026752

ansMap :: M.Map Integer Int
ansMap = M.fromAscList [(i, collatz i) | i <- [1..1000000]]
  where collatz 1 = 0
        collatz x = if x' <= 1000000 then 1 + ansMap M.! x'
                                     else 1 + collatz x'
          where x' = if even x then x `div` 2 else x*3 + 1

-- this code is really fast
fst $ maximumBy (comparing snd) $ M.toList ansMap

This seemed like a reasonable strategy, so I decided to take 1000000 and feed it as a variable to the function, so I could compute the Collatz sequence for even more numbers:

ansMap :: Integer -> M.Map Integer Int
ansMap n = M.fromAscList [(i, collatz i) | i <- [1..n]]
  where collatz 1 = 0
        collatz x = if x' <= n then 1 + ansMap n M.! x'
                                     else 1 + collatz x'
          where x' = if even x then x `div` 2 else x*3 + 1

-- but then suddenly this is slow
fst $ maximumBy (comparing snd) $ M.toList ansMap 1000000

This confuses me, since all I did was take n out and pass it back in! I don't know much about the Haskell runtime. Please help me understand! Thank you in advance.

Daniel Wagner · Accepted Answer · 2025-09-14 03:04:55Z

8

In Emily's answer, ansMap is a M.Map Integer Int which is recursively defined in terms of itself. In your modified code, ansMap is a function, and ansMap n returns a M.Map Integer Int that is recursively defined in terms of other calls to ansMap. But those recursive calls to ansMap themselves construct and return a (distinct) M.Map Integer Int, which ends up making tons of different M.Map Integer Ints until the recursion bottoms out.

You can fix this simply by making it so that there is once again a M.Map Integer Int that is recursively defined in terms of itself, rather than having a recursively defined function.

ansMap :: Integer -> M.Map Integer Int
ansMap n = mapping
  where mapping = M.fromAscList [(i, collatz i) | i <- [1..n]]
        collatz 1 = 0
        collatz x = if x' <= n then 1 + mapping M.! x'
                               else 1 + collatz x'
          where x' = if even x then x `div` 2 else x*3 + 1

Note that collatz uses mapping, and does not make a recursive call to ansMap -- so ansMap is not recursively defined (though mapping is).

That was an asymptotic speedup; there are also constant-factor speedups available. For example, switching Integer to Int, switching Map to Vector, and doing bit-twiddly operations in the definition of x' makes it about an order of magnitude faster. If switching to Int makes you nervous, staying with Integer is only about 50% slower.

ansMap :: Int -> V.Vector Int
ansMap n = mapping
  where mapping = V.generate (n+1) collatz
        collatz 0 = -1
        collatz 1 = 0
        collatz x = if x' <= n then 1 + mapping V.! x'
                               else 1 + collatz x'
          where x' = if x .&. 1 == 0 then shiftR x 1 else x*3 + 1

V.maxIndex $ ansMap 1000000

edited Sep 14 at 3:04

answered Sep 14 at 2:39

Daniel Wagner

156k10 gold badges231 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

daikonradish Sep 14 at 6:21

quick q -- how did you develop an eye for recognizing where to 'pull out' the recursion?

Daniel Wagner Sep 14 at 7:08

Your question is probably too specific. As asked, the answer is "there's only one place to look" -- namely, at function definitions -- so it doesn't take long to "develop an eye". You only need two or three examples to see the pattern. But more generally, there's a broad class of things I watch for and code smells I've picked up. Unfortunately there's no single answer for how; I've aggregated tricks by answering questions (and reading others' answers) here and on the #haskell IRC channel and by many happy hours spent making, finding, and fixing mistakes in my own Haskell programs

chi Sep 14 at 9:06

It's a fairly standard pattern in Haskell (and other functional languages). For a different example, consider f = \n -> let a = bigData in e, this allocates bigData only after having taken n as input, and will do so at each call. Instead, f = let a = bigData in \n -> e allocates bigData only once, and reuses the data structure for many ns. When doing many calls (e.g., using recursion) this matters. Memoization, as you discovered, it the most common scenario where this shows up.

Geoffrey Warne · Accepted Answer · 2025-09-15 14:36:51Z

1

This seems to be a question about sharing, as is covered in Chapter 27 of "Haskell Programming from First Principles".

Being a function with explicit, named arguments also prevents sharing. Haskell is not fully lazy; it is merely non-strict, so it is not required to remember the result of every function application for a given set of arguments, nor would it be desirable given memory constraints. (pp. 1053-1054)

The book recommends using Debug.Trace to identify what is or is not being shared.

If you try these:

import Debug.Trace
ansMap = (trace "eval" M.fromAscList [(i, collatz i) | i <- [1..1000000]])
ansMap n = (trace "eval" M.fromAscList [(i, collatz i) | i <- [1..n]])

you will notice that in the first version 'ansMap' is only evaluated once, but in the case of the function with the named argument, it is evaluated many times. In the first case, the map is able to be shared, but in the second the named argument prevents sharing.

(At least this is how I understand it.)

edited Sep 15 at 14:36

answered Sep 15 at 2:33

Geoffrey Warne

3472 silver badges11 bronze badges

2 Comments

Daniel Wagner Sep 15 at 14:07

For the purposes of this question, it's a great answer. I'm not sure I love everything in the quote, though. While it's true that Haskell the language specification only requires non-strict evaluation, in practice GHC is fully lazy by default; the distinction between non-strict and lazy is not the explanation for why the Haskell spec doesn't imply that an implementation must memoize.

Geoffrey Warne Sep 15 at 14:35

Thanks! I had a feeling that you were saying roughly the same thing but with greater precision. I will look at your answer again to understand the issue better.

Collectives™ on Stack Overflow

Adding a single variable to the function signature makes code significantly slower

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related