Ticket #5926 (closed feature request: fixed)
Add strict versions of modifyIORef and atomicModifyIORef
|Reported by:||joeyadams||Owned by:||simonmar|
|Type of failure:||Runtime performance bug||Difficulty:||Unknown|
|Test Case:||Blocked By:|
It is easy to misuse modifyIORef and atomicModifyIORef due to their lack of strictness. For example, if you use an IORef as a counter, use modifyIORef to increment it, and never evaluate the value until the very end, your program will leak memory, and you may even get a stack overflow:
ref <- newIORef 0 replicateM_ 1000000 $ modifyIORef ref (+1) readIORef ref >>= print
Today, I found a space leak in the tls package. Repeatedly calling sendData would leak memory. It didn't take me long to find the cause once I noticed a module named "Measurement" used for gathering connection statistics. It used modifyIORef to update the Measurement structure. When I changed it to this:
x <- readIORef (ctxMeasurement ctx) writeIORef (ctxMeasurement ctx) $! f x
the space leak went away.
A more subtle mistake can be made using atomicModifyIORef. Can you spot the problem with this code?
atomicModifyIORef ref (\_ -> (new_value, ()))
It's not incrementing anything, it's just replacing the value. However, it's still deferring evaluation of the function. This mistake was pointed out in The Monad.Reader Issue 19, where they suggested the following idiom:
x <- atomicModifyIORef ref (\_ -> (new_value, ())) x `seq` return ()
Thus, I believe there should be strict variants of modifyIORef and atomicModifyIORef, if only to warn programmers of these pitfalls.
modifyIORef' is pretty straightforward: force the result of applying the function:
modifyIORef' ref f = do x <- readIORef ref let x' = f x x' `seq` writeIORef ref x'
The only question is: would it be better to force x' after writeIORef instead of before it, in case a caller is trying to share the IORef among threads (which they shouldn't be)?
atomicModifyIORef is less straightforward. Should we force the values themselves? It is possible to avoid the space leak above without forcing either the new value or the return value:
atomicModifyIORef' :: IORef a -> (a -> (a,b)) -> IO b atomicModifyIORef' ref f = do p <- atomicModifyIORef ref (\a -> let p = f a in (fst p, p)) p `seq` return (snd p)
It also allows f to decide what to force. For example, with this definition of atomicModifyIORef', the following program prints 10000000 and does not leak memory:
ref <- newIORef 0 replicateM_ 10000000 $ atomicModifyIORef' ref (\n -> let n' = n + 1 in n' `seq` (n', undefined)) readIORef ref >>= print
In the attached patch, I didn't implement atomicModifyIORef' this way. Instead, I made it force both the old and new values, and added a separate function called atomicWriteIORef that has the same signature as writeIORef, but is based on atomicModifyIORef.
I believe the real value of such functions is to warn programmers about these pitfalls.