id,summary,reporter,owner,description,type,status,priority,milestone,component,version,resolution,keywords,cc,os,architecture,failure,difficulty,testcase,blockedby,blocking,related
7602,Threaded RTS performing badly on recent OS X (10.8?),simonmar,thoughtpolice,"This ticket is to remind us about the following problem: OS X is now using llvm-gcc, and as a result GHC's garbage collector with -threaded is much slower than it should be (approx 30% slower overall runtime).  Some results here: [http://www.haskell.org/pipermail/cvs-ghc/2011-July/063552.html]

This is because the GC code relies on having fast access to thread-local state.  It uses one of two methods: either a register variable (gcc only) or `__thread` variables (which aren't supported on OS X).  To make things work on OS X, we use calls to `pthread_getspecific` instead (see #5634), which is quite slow, even though it compiles to inline assembly.

I don't recall which OS X / XCode versions are affected, maybe a Mac expert could fill in the details.

We have tried other fixes, such as passing around the thread-local state as extra arguments, but performance wasn't good. Ideally Apple will implement TLS in OS X at some point and we can start to use it.

A workaround is to install a real gcc (using homebrew?) and use that to compile GHC.  Whoever builds the GHC distributions for OS X should probably do it that way, so everyone benefits.
",bug,new,high,7.8.1,Runtime System,7.7,,,johan.tibell@… chak@…,MacOS X,x86_64 (amd64),None/Unknown,Unknown,,7678,,
