Ticket #3748 (closed merge: fixed)
Bug in rts/Win32/Ticker.c can cause process hang on exit in Win32
|Reported by:||sgf||Owned by:||igloo|
|Type of failure:||Runtime crash||Difficulty:|
|Test Case:||Blocked By:|
On Win32, the ticker thread is shut down by signalling for it to close, waiting 20ms, and then calling TerminateThread if it hasn't already terminated. On a heavily-loaded system, the 20ms timeout can be exceeded, and TerminateThread is called. There is then a race condition. According to our tests, in about 1 in 100 calls to TerminateThread, the ticker thread is holding the Windows loader lock (which is held during parts of thread shutdown). It is killed holding the lock, and no other thread can acquire the lock. Therefore, no other thread can die, and the whole process hangs on exit.
Isn't the Win32 API lovely?
Given the timeout occurs about 1 in 100 times on a heavily-loaded box (more runnable threads than cores), and then the hang occurs on 1 in 100 timeouts, it's a slightly tedious bug to reproduce. My test case was 4 shell windows in infinite loops running a small GHC executable that forces the ticker thread to be started by using System.Timeout.timeout.
As the code suggests, if the ticker is compiled into a DLL, the thread must die before the DLL can be safely unloaded - otherwise an unhandled exception will wreak havoc in the process. The best suggestion I can think of is to increase the timeout, to 200ms say. After all, the timeout should rarely be reached except on heavily loaded systems, few people need a guaranteed quick response time on DLL unload, and any time the timeout is hit we introduce the possibility of basically breaking the process when we call TerminateThread.
For the runtime compiled into an executable, there is no need to call TerminateThread. The executable will remain mapped until after the ticker thread is forcibly terminated by the Windows system. Instead, we can simply attempt to signal the thread to close, wait for a timeout, and then shrug our shoulders and return, giving the least-unclean program shutdown possible in that case.
If people are happy with my proposed changes, I can make a patch.