id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	os	architecture	failure	difficulty	testcase	blockedby	blocking	related
5444	Slow 64-bit primops on 32 bit system	Khudyakov		"GHC primops for 64-bit arithmetic are implemented as FFI calls. It leads to serious performance penalty for 32 bit code which heavily uses 64-bit arithmetics.

I found this while investigating poor performance of mwc-random on 32-bit systems. 32-bit build runs 3-4 times slower than 64-bit build on the same hardware. It's difficult to estimate how faster would run optimal implementation since it doesn't exist. But it's probably at least 2x slowdown.


Here is simple program to demonstrate issue
{{{
sqr64 :: Int32 -> Int64
sqr64 x = y * y where y = fromIntegral x
}}}

Here is optimized core
{{{
$wsqr64 :: Int# -> Int64
$wsqr64 =
  \ (ww_sGO :: Int#) ->
    case {__pkg_ccall ghc-prim hs_intToInt64 Int#
                                    -> State# RealWorld -> (# State# RealWorld, Int64# #)}_aFY
           ww_sGO realWorld#
    of _ { (# _, ds2_aG2 #) ->
    case {__pkg_ccall ghc-prim hs_timesInt64 Int64#
                                    -> Int64# -> State# RealWorld -> (# State# RealWorld, Int64# #)}_aGc
           ds2_aG2 ds2_aG2 realWorld#
    of _ { (# _, ds4_aGi #) ->
    I64# ds4_aGi
    }
    }

sqr64 :: Int32 -> Int64
sqr64 = \ (w_sGM :: Int32) ->
    case w_sGM of _ { I32# ww_sGO -> $wsqr64 ww_sGO }
}}}"	bug	new	normal	7.6.2	Compiler	7.2.1			bos@… dterei	Unknown/Multiple	x86	Runtime performance bug					
