2

This pertains to Emily's answer here: https://stackoverflow.com/a/13850560/2026752

ansMap :: M.Map Integer Int
ansMap = M.fromAscList [(i, collatz i) | i <- [1..1000000]]
  where collatz 1 = 0
        collatz x = if x' <= 1000000 then 1 + ansMap M.! x'
                                     else 1 + collatz x'
          where x' = if even x then x `div` 2 else x*3 + 1

-- this code is really fast
fst $ maximumBy (comparing snd) $ M.toList ansMap 

This seemed like a reasonable strategy, so I decided to take 1000000 and feed it as a variable to the function, so I could compute the Collatz sequence for even more numbers:

ansMap :: Integer -> M.Map Integer Int
ansMap n = M.fromAscList [(i, collatz i) | i <- [1..n]]
  where collatz 1 = 0
        collatz x = if x' <= n then 1 + ansMap n M.! x'
                                     else 1 + collatz x'
          where x' = if even x then x `div` 2 else x*3 + 1

-- but then suddenly this is slow
fst $ maximumBy (comparing snd) $ M.toList ansMap 1000000

This confuses me, since all I did was take n out and pass it back in! I don't know much about the Haskell runtime. Please help me understand! Thank you in advance.

2 Answers 2

8

In Emily's answer, ansMap is a M.Map Integer Int which is recursively defined in terms of itself. In your modified code, ansMap is a function, and ansMap n returns a M.Map Integer Int that is recursively defined in terms of other calls to ansMap. But those recursive calls to ansMap themselves construct and return a (distinct) M.Map Integer Int, which ends up making tons of different M.Map Integer Ints until the recursion bottoms out.

You can fix this simply by making it so that there is once again a M.Map Integer Int that is recursively defined in terms of itself, rather than having a recursively defined function.

ansMap :: Integer -> M.Map Integer Int
ansMap n = mapping
  where mapping = M.fromAscList [(i, collatz i) | i <- [1..n]]
        collatz 1 = 0
        collatz x = if x' <= n then 1 + mapping M.! x'
                               else 1 + collatz x'
          where x' = if even x then x `div` 2 else x*3 + 1

Note that collatz uses mapping, and does not make a recursive call to ansMap -- so ansMap is not recursively defined (though mapping is).

That was an asymptotic speedup; there are also constant-factor speedups available. For example, switching Integer to Int, switching Map to Vector, and doing bit-twiddly operations in the definition of x' makes it about an order of magnitude faster. If switching to Int makes you nervous, staying with Integer is only about 50% slower.

ansMap :: Int -> V.Vector Int
ansMap n = mapping
  where mapping = V.generate (n+1) collatz
        collatz 0 = -1
        collatz 1 = 0
        collatz x = if x' <= n then 1 + mapping V.! x'
                               else 1 + collatz x'
          where x' = if x .&. 1 == 0 then shiftR x 1 else x*3 + 1

V.maxIndex $ ansMap 1000000
Sign up to request clarification or add additional context in comments.

3 Comments

quick q -- how did you develop an eye for recognizing where to 'pull out' the recursion?
Your question is probably too specific. As asked, the answer is "there's only one place to look" -- namely, at function definitions -- so it doesn't take long to "develop an eye". You only need two or three examples to see the pattern. But more generally, there's a broad class of things I watch for and code smells I've picked up. Unfortunately there's no single answer for how; I've aggregated tricks by answering questions (and reading others' answers) here and on the #haskell IRC channel and by many happy hours spent making, finding, and fixing mistakes in my own Haskell programs
It's a fairly standard pattern in Haskell (and other functional languages). For a different example, consider f = \n -> let a = bigData in e, this allocates bigData only after having taken n as input, and will do so at each call. Instead, f = let a = bigData in \n -> e allocates bigData only once, and reuses the data structure for many ns. When doing many calls (e.g., using recursion) this matters. Memoization, as you discovered, it the most common scenario where this shows up.
1

This seems to be a question about sharing, as is covered in Chapter 27 of "Haskell Programming from First Principles".

Being a function with explicit, named arguments also prevents sharing. Haskell is not fully lazy; it is merely non-strict, so it is not required to remember the result of every function application for a given set of arguments, nor would it be desirable given memory constraints. (pp. 1053-1054)

The book recommends using Debug.Trace to identify what is or is not being shared.

If you try these:

import Debug.Trace
ansMap = (trace "eval" M.fromAscList [(i, collatz i) | i <- [1..1000000]])
ansMap n = (trace "eval" M.fromAscList [(i, collatz i) | i <- [1..n]])

you will notice that in the first version 'ansMap' is only evaluated once, but in the case of the function with the named argument, it is evaluated many times. In the first case, the map is able to be shared, but in the second the named argument prevents sharing.

(At least this is how I understand it.)

2 Comments

For the purposes of this question, it's a great answer. I'm not sure I love everything in the quote, though. While it's true that Haskell the language specification only requires non-strict evaluation, in practice GHC is fully lazy by default; the distinction between non-strict and lazy is not the explanation for why the Haskell spec doesn't imply that an implementation must memoize.
Thanks! I had a feeling that you were saying roughly the same thing but with greater precision. I will look at your answer again to understand the issue better.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.