Add strict versions of modifyIORef and atomicModifyIORef

It is easy to misuse modifyIORef and atomicModifyIORef due to their lack of strictness. For example, if you use an IORef as a counter, use modifyIORef to increment it, and never evaluate the value until the very end, your program will leak memory, and you may even get a stack overflow:

ref <- newIORef 0
replicateM_ 1000000 $ modifyIORef ref (+1)
readIORef ref >>= print

Today, I found a space leak in the tls package. Repeatedly calling sendData would leak memory. It didn't take me long to find the cause once I noticed a module named "Measurement" used for gathering connection statistics. It used modifyIORef to update the Measurement structure. When I changed it to this:

x <- readIORef (ctxMeasurement ctx)
writeIORef (ctxMeasurement ctx) $! f x

the space leak went away.

A more subtle mistake can be made using atomicModifyIORef. Can you spot the problem with this code?

atomicModifyIORef ref (\_ -> (new_value, ()))

It's not incrementing anything, it's just replacing the value. However, it's still deferring evaluation of the function. This mistake was pointed out in The Monad.Reader Issue 19, where they suggested the following idiom:

x <- atomicModifyIORef ref (\_ -> (new_value, ()))
x `seq` return ()

Thus, I believe there should be strict variants of modifyIORef and atomicModifyIORef, if only to warn programmers of these pitfalls.

modifyIORef' is pretty straightforward: force the result of applying the function:

modifyIORef' ref f = do
    x <- readIORef ref
    let x' = f x
    x' `seq` writeIORef ref x'

The only question is: would it be better to force x' after writeIORef instead of before it, in case a caller is trying to share the IORef among threads (which they shouldn't be)?

atomicModifyIORef is less straightforward. Should we force the values themselves? It is possible to avoid the space leak above without forcing either the new value or the return value:

atomicModifyIORef' :: IORef a -> (a -> (a,b)) -> IO b
atomicModifyIORef' ref f = do
    p <- atomicModifyIORef ref (\a -> let p = f a in (fst p, p))
    p `seq` return (snd p)

It also allows f to decide what to force. For example, with this definition of atomicModifyIORef', the following program prints 10000000 and does not leak memory:

ref <- newIORef 0
replicateM_ 10000000 $
    atomicModifyIORef' ref
        (\n -> let n' = n + 1
                in n' `seq` (n', undefined))
readIORef ref >>= print

In the attached patch, I didn't implement atomicModifyIORef' this way. Instead, I made it force both the old and new values, and added a separate function called atomicWriteIORef that has the same signature as writeIORef, but is based on atomicModifyIORef.

I believe the real value of such functions is to warn programmers about these pitfalls.

Trac metadata

Trac field	Value
Version	7.4.1
Type	FeatureRequest
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	libraries/base
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information