Reducing the Impact of Garbage Collection

Although the allocation change decreased the amount of memory being created, it did not help with the run time/copying problem. This benchmark in particular emphasized the issue.

Theoretically these functions should be pretty much the same. The only difference is the amount of live memory when running them. The benchmark looked like this though:

benchmarking small-fast
measurement took 5.03 s
...

benchmarking large-slow
measurement took 69.20 s
...

                            Tot Time (elapsed)
 Gen  1      2430 colls,     68.188s  68.226s

 Total   time   75.000s ( 75.117s elapsed)

So we need to somewhow reduce the amount of time spent collecting the first generation. Thinking back to our measure loop, this is somewhat self-inflicted.

We have two calls to performGC, which is a manual way of calling the garbage collector. There were also two calls to performGC in the parent function, meaning we were paying four times for every sample taken. The parent ones can be removed almost immediately. They were just copied to the new measure location when they should have been permanently moved.

That cuts the time in half, but we still have the two pesky ones in measure. The issue is that sometimes you need to work on a large data structure, like the Map above.

A Quick Primer on the Garbage Collector

If you aren’t familiar with how the garbage collector works, here’s a quick review. The gc can be tuned, but I’ll just cover the defaults.

When GHC runs your program, it maintains the heap so you don’t have to. But because you never call free as you would in a language like C, GHC has to figure out when to reclaim the “garbage” memory no longer in use. It uses a generational garbage collector to keep track of your memory.

It works on the hypothesis that most memory does not live very long, which is often a successful model for functional languages. When you try to allocate some new memory, it looks in Gen 0 for space. If there isn’t any, it goes through all of Gen 0, kicking out dead blocks and moving live data to Gen 1. That is a “minor” garbage collection.

If it tries to move data to Gen 1 and there is no space, it performs a similar collection. However, instead of moving live data up another generation, it allocates twice the current size and copies everything over. This space grows depending on the program and can become quite costly to collect. This is a “major” collection.

performGC is asking for a major collection. We can ask for just a minor collection with performMinorGC.

Two wonderful writeups can be found here and here if you are looking for more information.

Getting Stats

As I just hinted, we want to move to performMinorGC, but the question is, why are we calling the gc here? Each time the garbage collector runs, it updates stats about the program. Fields like max live bytes and allocated bytes are only accurate immediately after a garbage collection.

After a bit of reading in GHC’s Runtime System (which you can follow here), I was able to confirm that these stats are updated in both major and minor collections. So the stats reported are at least up to date.

It is hard to quantify the impact that major vs. minor gc’s have on the actual sample though. The state of the heap impacts the speed of programs more than one would like. The trade-off in a language like Haskell is that you don’t have to manage the heap in your code, but you also can’t.

This is my way of writing a disclaimer: from all my tests, it would seem that using performGC vs. performMinorGC does not greatly impact the reported times. However, if you come across any measurements that are significantly different between Criterion versions, please report them. I certainly did not think of every edge case when thinking about and testing this code.

Previous Next