Ticket #5173 (closed feature request: fixed)

Opened 2 years ago

Last modified 2 years ago

Implement forward substitution of constants in the Cmm mini-inliner

Reported by: tibbe Owned by: simonmar
Priority: high Milestone: 7.2.1
Component: Compiler Version: 7.0.3
Keywords: Cc: johan.tibell@…, dterei
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

While working on a new set of primops for copying arrays, I ran into a Cmm optimizer issue where constants weren't being inlined if the temporary that the constant was assigned to was used more than once. Example:

fn
{
    bits64 a, b;

    a = 1;
    b = a + a;
    RET_N(b);
}

is optimized as

fn()    { []
        }
    ca: _cb::I64 = 1;
        R1 = _cb::I64 + _cb::I64;
        jump (I64[Sp + 0]) ();
}

Attachments

Change History

Changed 2 years ago by tibbe

The attached patch implement forward substitution of constants, with the drawback that the original assignment isn't removed. We could either leave the assignment to a later dead-assignment elimination pass or try to improve this pass to remove it as well.

Changed 2 years ago by tibbe

  • status changed from new to patch

I've updated the implementation to also remove the dead binding. Please review.

Changed 2 years ago by tibbe

Here are the results for validate:

OVERALL SUMMARY for test run started at Fri May  6 16:03:28 CEST 2011
    2756 total tests, which gave rise to
    9978 test cases, of which
       0 caused framework failures
    7533 were skipped

    2362 expected passes
      80 expected failures
       0 unexpected passes
       3 unexpected failures

Unexpected failures:
   3816(normal)
   T5084(normal)
   user001(normal)

The three failures don't seem related. I believer if seen user001 and 3816 before, when working on the I/O manager.

Changed 2 years ago by tibbe

Here are the nofib results:

            Min          -0.0%     -0.0%     -2.0%     -2.0%     -2.4%
            Max          +0.0%     +0.1%     +0.4%     +0.8%     +0.7%
 Geometric Mean          -0.0%     +0.0%     -0.3%     -0.2%     -0.0%

A tiny decrease in runtime and allocation which is probably just noise.

Note that I added this optimization in order to enable other optimizations, that depend on calls at the Cmm level have constant arguments whenever possible, so the lack of any improvements on nofib isn't a problem.

Changed 2 years ago by tibbe

For completeness, here's some real code that benefits from this optimization:

stH_ret()
        { update_frame: <none>
          has static closure: False type: 0
          desc: 0
          tag: 32
          stack: [Just _ssK::I64]
          srt: _no_srt_
        }
    cv1:
        _cv5::I64 = I64[R1 + 7];
        _cv7::I64 = 1;
        _cv9::I64 = I64[Sp + 8];
        _cvb::I64 = 1;
        _cvd::I64 = 8;
        _cvf::I64 = _cvd::I64 * 8;
        _cvh::I64 = (_cv5::I64 + 24) + (_cv7::I64 << 3);
        _cvj::I64 = (_cv9::I64 + 24) + (_cvb::I64 << 3);
        foreign "ccall"
          MO_Memcpy((_cvj::I64, PtrHint), (_cvh::I64, PtrHint), (_cvf::I64,),
                    (8,))[_unsafe_call_];
        I64[_cv9::I64] = stg_MUT_ARR_PTRS_DIRTY_info;
        _cvl::I64 = (_cv9::I64 + 24) + (I64[_cv9::I64 + 8] << 3);
        _cvn::I64 = _cvb::I64 >> 7;
        foreign "ccall"
          MO_Memset((_cvl::I64 + _cvn::I64, PtrHint), (1,),
                    ((_cvb::I64 + _cvd::I64 >> 7) - _cvn::I64 + 1,),
                    (8,))[_unsafe_call_];
        R1 = ghczmprim_GHCziUnit_Z0T_closure+1;
        Sp = Sp + 16;
        jump (I64[Sp + 0]) ();
}

(MO_Memcpy and MO_Memset are CallishMachOps that I'm currently adding. These MachOps will generate unrolled loops in the backend, if the number of iterations is statically known.)

Note how _cvb and _cvd are both constant and used more than once. Here's the optimized version, without forward substitution of constants:

stH_ret()
        { [const 1;, const 32;]
        }
    cv1:
        _cv9::I64 = I64[Sp + 8];
        _cvb::I64 = 1;
        _cvd::I64 = 8;
        _cvh::I64 = I64[R1 + 7] + 32;
        foreign "ccall"
          MO_Memcpy(((_cv9::I64 + 24) + (_cvb::I64 << 3), PtrHint),
                    (_cvh::I64, PtrHint), (_cvd::I64 << 3,), (8,))[_unsafe_call_];
        I64[_cv9::I64] = stg_MUT_ARR_PTRS_DIRTY_info;
        _cvn::I64 = _cvb::I64 >> 7;
        foreign "ccall"
          MO_Memset(((_cv9::I64 + 24) + ((I64[_cv9::I64 + 8] << 3) + _cvn::I64),
                     PtrHint),
                    (1,), ((_cvb::I64 + _cvd::I64 >> 7) + (1 - _cvn::I64),),
                    (8,))[_unsafe_call_];
        R1 = GHC.Unit.()_closure+1;
        Sp = Sp + 16;
        jump (I64[Sp + 0]) ();
}

_cv7 has been inlined, as it was used exactly once, but _cvb and _cvd haven't.

Here's the optimized version with forward substitution of constants:

stH_ret()
        { [const 1;, const 32;]
        }
    cv1:
        _cv9::I64 = I64[Sp + 8];
        _cvh::I64 = I64[R1 + 7] + 32;
        foreign "ccall"
          MO_Memcpy((_cv9::I64 + 32, PtrHint), (_cvh::I64, PtrHint), (64,),
                    (8,))[_unsafe_call_];
        I64[_cv9::I64] = stg_MUT_ARR_PTRS_DIRTY_info;
        _cvn::I64 = 0;
        foreign "ccall"
          MO_Memset(((_cv9::I64 + 24) + ((I64[_cv9::I64 + 8] << 3) + _cvn::I64),
                     PtrHint),
                    (1,), (0 - _cvn::I64 + 1,), (8,))[_unsafe_call_];
        R1 = GHC.Unit.()_closure+1;
        Sp = Sp + 16;
        jump (I64[Sp + 0]) ();
}

The code is better, but not perfect. The remaining reference to a register with a constant value is due to the mini-inliner only inlining constants, not expressions that could be folded into a constant.

I'll leave this as future work for now.

Changed 2 years ago by tibbe

I added the constant folding as well, the last example in the previous comment now looks great:

stH_ret()
        { [const 1;, const 32;]
        }
    cv1:
        _cv9::I64 = I64[Sp + 8];
        _cvh::I64 = I64[R1 + 7] + 32;
        foreign "ccall"
          MO_Memcpy((_cv9::I64 + 32, PtrHint), (_cvh::I64, PtrHint), (64,),
                    (8,))[_unsafe_call_];
        I64[_cv9::I64] = stg_MUT_ARR_PTRS_DIRTY_info;
        foreign "ccall"
          MO_Memset(((_cv9::I64 + 24) + (I64[_cv9::I64 + 8] << 3), PtrHint),
                    (1,), (1,), (8,))[_unsafe_call_];
        R1 = GHC.Unit.()_closure+1;
        Sp = Sp + 16;
        jump (I64[Sp + 0]) ();
}

Changed 2 years ago by dterei

  • cc dterei added

Changed 2 years ago by simonmar

  • owner set to simonmar
  • priority changed from normal to high
  • milestone set to 7.2.1

Thanks - I will review for 7.2.1.

Changed 2 years ago by tibbe

  • type changed from bug to feature request

Changed 2 years ago by tibbe

I've updated the patches to apply constant folding to the source expression rather to the target expression after inlining. This allows assignments like this one to be inlined:

_ccZ::I64 = 0 << 7;
_cd3::I64 = _ccZ;

Previously the forward substitution would only take place if _ccZ had already been substituted "in to", like in this example:

_cd1::I64 = 0;
_ccZ::I64 = 0 << _cd1;
_cd3::I64 = _ccZ;

Changed 2 years ago by simonmar

  • status changed from patch to closed
  • resolution set to fixed

committed, thanks!

  • changeset:e97f29804abdbf9b374aeb3661af340714ea1b60
  • changeset:ea44eadfb9d269d06b889fbfe41286bf0c7a730d
Note: See TracTickets for help on using tickets.