Using Heap View
There are some patterns in Haskell code that often lead to space leaks. Here is one you might run into very quickly when learning Haskell.
-- | Sum the numbers from 1 to i
sumRange :: Int -> Int -> IO Int
sumRange acc 0 = return acc
sumRange acc i = sumRange (acc + i) (i - 1)
(I’ve made the return type IO Int
so that I can use getClosureData
. You could theoretically wrap it in an unsafePerformIO
like the trace
function, but for now I’ll just do this.)
Running sumRange 0 1000000
on my computer (with a mostly unoptimized GHC), I get these performance stats:
$ ./test +RTS -s
121,344,040 bytes allocated in the heap
57,823,520 bytes maximum residency
Total time 0.419s
Productivity 34.8% of total elapsed
Looking at our program, these numbers are not good. We shouldn’t be building any big structures of data. In fact, we should have roughly two Int
s worth of memory at any given time. One for acc
, one for i
.
At this point, we have a very specific problem: there is too much data being held on to. The “maximum residency” tells us that the garbage collector found 58MB of live data on the heap when it ran. That doesn’t seem right. If we lower i
to 100
, then only 44KB
are used. So our program memory grows with i
even though it should be roughly constant.
We’ve profiled and found a memory leak, so it is time to get rid of it. At this point, there are basically two recommendations. One, look at core. You can dump the intermediate stages of GHC compilation, which look a little closer to imperative programs. This is often useful, but somewhat heavy-handed. It also can become quite hard to look through in larger projects. Two, add strictness annotations everywhere and hope for the best. This is in fact the solution to our problem, but it would be nice to have a target for our !-gun. In this case, there are only two options (!acc
and !i
) so we could just compile-profile-loop until we found the solution.
With heap-view
however, we have another option. Let’s look at the values on the heap during each loop. For the sake of screen space, we’ll limit the loop to only 3 iterations. Here is the updated code and output, again trimmed for clarity:
import GHC.Exts.Heap
-- | Sum the numbers from 1 to i
sumRange :: Int -> Int -> IO Int
sumRange acc 0 = return acc
sumRange acc i = do
ca <- getClosureData acc
putStr "acc: "
print ca
ci <- getClosureData i
putStr " i: "
print ci
sumRange (acc + i) (i - 1)
$ ./test
i: ConstrClosure { info = StgInfoTable {tipe = CONSTR_0_1}
, ptrArgs = [], dataArgs = [3], name = "I#"}
acc: ConstrClosure { info = StgInfoTable {tipe = CONSTR_0_1}
, ptrArgs = [], dataArgs = [0], name = "I#"}
i: ConstrClosure { info = StgInfoTable {tipe = CONSTR_0_1}
, ptrArgs = [], dataArgs = [2], name = "I#"}
acc: ThunkClosure { info = StgInfoTable {tipe = THUNK_2_0 }
, ptrArgs = [0x00c66518/1,0x00c66548/1], dataArgs = []}
i: ConstrClosure { info = StgInfoTable {tipe = CONSTR_0_1}
, ptrArgs = [], dataArgs = [1], name = "I#"}
acc: ThunkClosure { info = StgInfoTable {tipe = THUNK_2_0 }
, ptrArgs = [0x00420000fe50,0x00420000fe70/1], dataArgs = []}
If we look at i
, we see that at every iteration of the loop, it is just our Int
constructor from last time. So i
stays constant in space usage. Looking at acc
though, it is a thunk. Every time we go through the loop, we are simply adding another thunk to the chain. In memory, acc
looks something like this (assuming we start with acc == 0
):
+---+---------+---+
| + | old_acc | 0 |
+---+---------+---+
| +---+-----------+---+
->| + | older_acc | 1 |
+---+-----------+---+
|
...
| +---+---+---+
->| + | 0 | i |
+---+---+---+
Or, in a perhaps more understandable format:
(...(((0 + i) + (i-1)) + (i-2)) + ... + 1)
It seems like making acc
strict should solve our problem.
{-# LANGUAGE BangPatterns #-}
-- | Sum the numbers from 1 to i
sumRange :: Int -> Int -> IO Int
sumRange !acc 0 = return acc
sumRange acc i = sumRange (acc + i) (i - 1)
Running our program with 1 million again, our profile looks much better.
$ ./test +RTS -s
88,051,752 bytes allocated in the heap
44,576 bytes maximum residency
Total time 0.141s
Productivity 99.3% of total elapsed
If you include the closure data output, you’ll see that both i
and acc
are now always Int
s. A single !
made our code four times faster! This is definitely not an ideal situation, but it is the state of things right now.
For me, heap-view
sits at a nice place between guessing and reading core. If you have a hypothesis, this makes it reasonable to test it without changing build flags. More than anything, I think it removes a lot of the guess-work from exploring Haskell’s evaluation. You can crawl all over the heap if you want to. I hope people try it out and find interesting use-cases for it.