Improve interaction between mutable arrays and GC
This is the result of a discussion between myself and Simon PJ a few weeks ago, inspired by recent discoveries of poor performance with mutable arrays (eg. see Data.Hash discussion on glasgow-haskell-users around October 2005).
Note that all this applies to mutable arrays of pointers, i.e. IOArray
and STArray
, not to unboxed arrays, i.e. IOUArray
and STUArray
.
Current implementation
There are two primitive types: MutArray#
and Array#
. We convert between them with:
unsafeFreezeArray# :: MutArr# s a -> State# s -> (# State# s, Array# a #)
unsafeThawArray# :: Array# a -> State# s -> (# State# s, MutArr# s a #)
An Arr# is not normally on the old-gen mutable list, unless (a) it has pointers to young gen objects, or (b) it has been recently frozen. The implementation of unsafeFreezeArray#
is a single write to the header word of the array. The implementation of unsafeThawArray#
is slightly more complex: if the array was not already on the mutable list (indicated by the value of the header), then we add it. Also, we change the header word to indicate that the array is now mutable.
A MutArr#
is always on the mutable list.
Objects pointed to by Array# are eagerly promoted to the generation in which the Array# resides, with the aim that the Array# can then be removed from the mutable list.
It is only safe to write to a MutArr#
, so if multiple threads are accessing an array, they should not be doing thaw/freeze tricks without extra locking around the array (such behaviour can cause the GC to crash).
The Problem
The problem is that mutable arrays are always completely traversed on every GC. To get around this, we can keep an array in a frozen state and thaw it just before writing, then freeze it again afterward. This is a bit inconvenient, not to mention unsafe with multiple threads unless extra locking is used.
Furthermore, a modified array is completely scanned, whereas for larger arrays it would be much better to just scan the part of the array that had been modified (known in the GC literature as "card-marking").
The benefit of the current approach is that writing to a mutable array is a single write instruction, whereas to do card-marking or something else requires a write-barrier. The unsafeThaw/write/unsafeFreeze sequence amounts to a write barrier, so if this is a common technique we should provide an easy way to do it, possibly making it the default.
Solutions
Leaving aside card-marking for now, let's think about incorporating the write barrier in the write operation.
Suppose that mutable arrays are always kept on the mutable list, but the header word indicates whether the array needs to be scanned or not (eg. we have MUT_ARR_DIRTY
, MUT_ARR_CLEAN
). The array write op should (a) set the header to MUT_ARR_DIRTY
, and (b) do the write. The GC turns MUT_ARR_DIRTY
into MUT_ARR_CLEAN
when everything the array points to is in the same generation (or older).
Downsides to this:
-
intitialising a mutable array, or doing block writes, will be
more painful, because each write will have the write barrier
(perhaps not too painful)
How does freezing/thawing interact with this? We currently create immutable arrays by starting with a MutArr#
, intitialising it, and then freezing it to make an Arr#
. We can still do this, exactly as now (and with the same thread-unsafety), but initialization will be a bit slower due to the write barrier.
Block writes
We could try to provide for "block writes", by allowing a thread to "open" the array for modification, and then "close" it again after it had finished writing, with all writes in between being done without a write barrier. This would replace unsafeThaw/unsafeFreeze.
To do this safely, we would have to use some kind of synchronisation on the open/close; techniques that we came up with were to increment (atomically) a counter in the array header, or to allocate a new heap object pointing to the array in the current thread's allocation area.
Card marking
We could refine the write barrier so that it marks just part of the array as dirty, instead of the whole array. The natural choice is to put the mark bit in the block descriptor for the current block, giving us a granularity of 4/8k, which is possibly a bit large but other solutions are much more expensive. Even this would significantly increase the cost of the write barrier, so it may be that we want a different kind of array type for this (LargeMutArr#
?). Furthermore, currently not all arrays have their own block descriptors ("large objects" in GHC's storage manager), the small ones are allocated in movable memory. To do this, we would have to ensure that every array had its own block (or check in the write barrier, which adds even more expense).
IORefs
This also affects IORefs, which are essentially single-element IOArrays. I just noticed that GHC often has a large number of IORefs hanging around in the heap from the typechecker, and the cost of traversing the mutable list can dominate minor GCs.