id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	os	architecture	failure	difficulty	testcase	blockedby	blocking	related
3969	Poor performance of generated code on x86.	milan		"When implementing a hash function of !ByteStrings, I found the Haskell implementation to be 2.5 slower than the equivalent C implementation.

The code of both functions is attached in [attachment:Hash.hs] and [attachment:c_hash.c]. You can compile using
{{{
ghc -O --make Hash.hs c_hash.c
}}}
and run C implementation as {{{./Hash c bstr_len}}} and Haskell implementation as {{{./Hash h bstr_len}}}.

There is no apparent problem in the Haskell implementation -- both the ```foldl'``` and the ```addWord8``` are inlined and everything in the main loop is unboxed.

I believe the performance loss is because of bad register allocation. On x86_64 is the Haskell implementation only ~1.2 times slower.

The comparison on Intel Xeon E5520 32-bit, Windows 7, GHC 6.12.1 is in [attachment:res-32bit.txt]. C and Haskell implementation is run three times, and on strings of length 10, 50 and 100. All times are in seconds. The file also contains the assembler code of relevant methods.

On Intel Xeon E5320 64-bit, Fedora, GHC 6.12.1 is in [attachment:res-64bit.txt]."	bug	closed	normal		Compiler	6.12.1	wontfix	x86, runtime performance		Unknown/Multiple	x86	Runtime performance bug					
