id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	os	architecture	failure	difficulty	testcase	blockedby	blocking	related
4470	Loop optimization: identical counters	choenerzs		"Consider the small program below, where 'f' has to counters 'i' and 'j'. Both are completely identical; the only difference is that 'i' is used to change 's', while 'j' changes 'm'. It would be beneficial to have GHC transform 'f' into something like 'ff' so that one register less is required.

Neither GHC nor LLVM perform this optimization.

Code of this kind occurs when one uses the ""vector library"". See this discussion: [http://www.haskell.org/pipermail/glasgow-haskell-users/2010-November/019446.html]

{{{
{-# LANGUAGE BangPatterns #-}

module Main where

import Criterion.Main

f :: Int -> Int -> Int -> Int -> Int
f !i !j !s !m
  | i == 0    = s+m
  | otherwise = f (i-1) (j-1) (s + i+1) (m + j*5)

g :: Int -> Int
g !k = f k k 0 0


ff :: Int -> Int -> Int -> Int
ff !i !s !m
  | i == 0    = s+m
  | otherwise = ff (i-1) (s + i+1) (m + i*5)

gg :: Int -> Int
gg !k = ff k 0 0



{-
main = do
  print $ g 20
  print $ gg 20
-}

main = defaultMain
  [ bench "" g"" $ whnf g  20 -- 67.9ns
  , bench ""gg"" $ whnf gg 20 -- 64.5ns
  ]
}}}

Function 'f' produces this core:
{{{
$wf =
  \ (ww_s1uU :: Int#)
    (ww1_s1uY :: Int#)
    (ww2_s1v2 :: Int#)
    (ww3_s1v6 :: Int#) ->
    case ww_s1uU of wild_B1 {
      __DEFAULT ->
        $wf
          (-# wild_B1 1)
          (-# ww1_s1uY 1)
          (+# (+# ww2_s1v2 wild_B1) 1)
          (+# ww3_s1v6 (*# ww1_s1uY 5));
      0 -> +# ww2_s1v2 ww3_s1v6
    }
}}}

'wild_B1' and 'ww1_s1uY' should be merged in this case.

The attached source is above program."	feature request	new	normal	7.6.2	Compiler			loop optimization	choener@… pumpkingod@… michal.terepeta@… dterei	Unknown/Multiple	Unknown/Multiple	None/Unknown					
