Ticket #2110 (new feature request)

Opened 5 years ago

Last modified 4 months ago

Rules to eliminate casted id's

Reported by: igloo Owned by:
Priority: lowest Milestone: 7.6.2
Component: Compiler Version: 6.8.2
Keywords: Cc: rl, dimitris@…, ryani.spam@…, mail@…, sweirich@…, hackage.haskell.org@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Some people (e.g. jedbrown, who originally brought this up) have/use C libraries that deal with arrays of doubles (i.e. CDouble) that they would like to use from Haskell code, with the Double type (perhaps there are instances of some classes for Double, but not CDouble). In jedbrown's case he has a CArray datastructure that wraps the C array and is an instance of Haskell's IArray.

However, people doing this sort of thing tend to care a lot about performance, so don't want to spend O(n) time doing O(n) allocation. Thus the temptation is to make the assumption that CDouble == Double, and to create the CArray i Double directly.

The more correct way to do it would be to make a CArray i CDouble and then use amap realToFrac to convert it to a CArray i Double. However, I don't believe that GHC is currently smart enough to optimise away amap realToFrac.

Let's look at a simpler example:

cdoubles :: [CDouble]
cdoubles = read "foo"

doubles :: [Double]
doubles = map realToFrac cdoubles

(here map is playing the role of amap). The core for this looks like (very simplified)

Q.a1 = \ (ds_aMc :: Foreign.C.Types.CDouble) -> ds_aMc
Q.doubles = GHC.Base.map (Q.a1 `cast` <mumble>) Q.cdoubles

It looks to me that if we could produce

Q.doubles = GHC.Base.map (id `cast` <mumble>) Q.cdoubles

instead then we might be able to have a rule

forall mumble . map (id `cast` mumble) = id `cast` mumble'

The major problem, as far as I can see, is how to construct the mumble'.

Does that sound plausible?

Change History

  Changed 5 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

  Changed 5 years ago by simonmar

  • os changed from Unknown to Unknown/Multiple

  Changed 4 years ago by igloo

  • milestone changed from 6.10 branch to 6.12 branch

  Changed 4 years ago by ryani

  • cc ryani.spam@… added

follow-up: ↓ 21   Changed 3 years ago by simonpj

  • cc rl, dimitris@… added
  • failure set to None/Unknown

The issue here is how to write the rule in source syntax. The internal form is fine. Here's what it should look like in internal form, I think:

  RULE "map/cast" forall (a::*) (b::*) (c::*) 
                         (co :: (a->b ~ a->c)) 
                         (f::a->b)
                  map a c (f |> co) = map a b f |> [right co]

  RULE "map/id" forall (a::*) map a a id = id

I've split the rule into two. One lifts out the cast, and one spots map-of-identity. I think (but I have not tested) that these rules will work, if only we could write them in the source program. We can write the second but not the first.

This question is related to #2600, where we want to bind type variables in rules. Here we want to bind equality constraints too, and we might want to bind dictionaries too. How about this?

  RULE "map/cast" forall type a b c. (a~b) =>
                  forall (f :: a->b). 
                  map (f :: a->c) = map f :: [b]

Two things going on here:

  • I've used two foralls. The first has just the same syntax as a type-level forall, apart from the "type" keyword. It and binds type variables and constraints. The second is the one we have right now in RULES. Maybe we could omit the "type", but that might be hard to parse.
  • I've used type signatures to force the casts to appear in the "right" place. But this is a bit indirect.

Better suggestions on syntax welcome.

Simon

  Changed 3 years ago by igloo

  • milestone changed from 6.12 branch to 6.12.3

  Changed 3 years ago by igloo

  • priority changed from normal to low
  • milestone changed from 6.12.3 to 6.14.1

  Changed 2 years ago by igloo

  • milestone changed from 7.0.1 to 7.0.2

  Changed 2 years ago by igloo

  • milestone changed from 7.0.2 to 7.2.1

  Changed 20 months ago by igloo

  • milestone changed from 7.2.1 to 7.4.1

  Changed 16 months ago by igloo

  • priority changed from low to lowest
  • milestone changed from 7.4.1 to 7.6.1

  Changed 14 months ago by nomeata

  • cc mail@… added

  Changed 14 months ago by nomeata

I’m considering whether this might be a good project for a bachelor’s thesis for my students, but I’m not sure if I can estimate the required effort correctly. What do you think - would you welcome such a contribution, or rather not have to deal with a beginner’s contribution?

  Changed 14 months ago by simonpj

I'm not sure it makes sense for an undergrad thesis.

  • It's clear what FC (Core) code to generate. The trick is to find a nice source-language design that feels easy and intuitive, and from which we can generate the right FC. Undergrads aren't going to be able to do that.
  • Adding type-variable binders in rules is fiddly, but not very glamorous.

And to be fair I am a bit concerned about dealing with an undergrad patch to the typechecker, one of the more complicated bits of GHC.

There are lots of other open tickets at Status/SLPJ-Tickets if you are interested in a list of open questions. But most of them are tricky in some way, or I'd have done them :-)

  Changed 14 months ago by simonpj

See also #5974

  Changed 9 months ago by igloo

  • milestone changed from 7.6.1 to 7.6.2

  Changed 8 months ago by nomeata

Today after the Haskell Symposium, I again shocked someone by telling him that "map unT" or "map T, where T is a newtype constructor, is not optimized away, so I was reminded of this bug. There really ought to happen something here.

If the rule is representable internall, but not easily written in user code, how about a compiler pass that detects map-like functions and then automatically adds appropriate rules to the source?

follow-up: ↓ 20   Changed 8 months ago by simonpj

  • cc sweirich@… added

Shocking it may be, but the question remains about how to fix it. Here are the issues I see. (I'm assuming we don't want something that works only for lists.)

  • What (precisely) is a "map-like" function, for some arbitrary data type? I'm not keen on spotting such things by somehow matching on their particular form. Remember data types can be mutually recursive, and may have several type parameters.
  • We can reasonably lift casts for data types that don't have a map function. For example, if we have
    newtype Age = MkAge Int
    data T a b = ..blah..blah...
    
    Then the coercion T coAge Bool :: T Int Bool ~ T Age Bool, where coAge is the coercion arising from the Age newtype, even if T doesn't have a map function defined on its first type parameter. System FC is great.
  • But how should that be specified in the source program? Suppose e :: T Int Bool. Then I have wondered about syntax like this:
       newtype[T $MkAge Bool] e
    
    Here the newtype says "here comes a newtype coercion". The bit in square brackets specifies the coercion, in the form of a type with some spot(s) replaced by a newtype data constructor, signalled by $, in this case MkAge. So the degenerate case newtype[$MkAge] e would be identical to MkAge e.

I wonder if anyone has beter ideas for syntax?

in reply to: ↑ 19   Changed 8 months ago by nomeata

Replying to simonpj:

* What (precisely) is a "map-like" function, for some arbitrary data type? I'm not keen on spotting such things by somehow matching on their particular form. Remember data types can be mutually recursive, and may have several type parameters.

I guess what you are not keen on is something like that:

Look for a function f that returns a "T a b c", has one argument that is also a "T a b c", and all other arguments have function types. Check if f is defined by pattern match on the "T a b c" argument. Check for each constructor "C x y z", if the body for the pattern is "C (h x) (h' y) (f h h' h z)", where the h's are the other arguments of f, or recursive calls to f (with the same function arguments). If all this holds, generate a rule that matches on f if all the function arguments are coercions c, c', c and replaces it by the (existing, as I understand it) coercion "T c c' c" applied to the non-function argument.

This is not very elegant, but it would catch most hand-written maps, including Data.List.map and the fmap in the Functor instances of Maybe, Const, Either a, Id, (,) a, Digit, Node, Elem, ViewL, ViewR (judging from a quick grep over base and libraries).

But I guess the question of how we expose the feature to the user, and whether GHC should also generate rules automatically if he can on a best-effort basis are two independent and orthogonal issues.

in reply to: ↑ 5   Changed 8 months ago by nomeata

Looking at

Replying to simonpj:

{{{ RULE "map/cast" forall (a::*) (b::*) (c::*) (co :: (a->b ~ a->c)) (f::a->b) map a c (f |> co) = map a b f |> [right co] RULE "map/id" forall (a::*) map a a id = id }}}

I wonder whether it is realistic to have the types inferred here, e.g.

{-# RULES "map/cast" forall f co. map (f `cast` co) = map f `cast` right co
          "map/id" map id = id
  #-}

This way, the current issue can be solved independently from #2600, and the remaining question is how to expose the Core expressions cast and right to the RULES parser – it seems that they are currently not exported as Haskell values.

  Changed 8 months ago by nomeata

I wish I could edit my comments, at least for 10 minutes or so...

Anyways, I just noticed that I did not understand the right value correctly. Shoudln’t that be something of type "(α ~ β) -> ([α] ~ [β])"? Unless I am reading it wrongly, the typing rule Right in the TLDI'07 paper goes the other way around. I guess I am expecting something like

{-# RULES "map/cast" forall f co. map (f `cast` co) = map f `cast` [] co
          "map/id" map id = id
  #-}

where [] :: (α ~ β) -> ([α] ~ [β]). Not sure if that already exists, though, and again Syntax problems pop up.

Or what if we free the programmer from worrying about coercions at all:

{-# RULES "map/cast" forall f co. map (casted f) = casted (map f)

"map/id" map id = id

#-}

On the LHS, casted x matches any x cast co. On the right hand side, casted y introduces a new coercion axiom and uses it. This may be less safe, but then, rewrite rules are already very unsafe.

  Changed 8 months ago by simonpj

I'm sorry, but I can't follow your line of thought.

In any case, I'm convinced that rewrite rules are not the way to solve this problem. As a programmer I absolutely do not want to rely on some relatively-complicated optimisation machinery kicking in to avoid a complete traversal of my data structure, when all I'm doing is changing the type. And I want to be able to change the type even if there isn't a map function.

Really the only difficult thing is surface syntax. Say I have

newtype Age = MkAge { unAge :: Int }

Then if I have

   e :: Tree (Either Int Int)

I'd like to be able to get these things eaily:

   e1 :: Tree (Either Age Int)
   e2 :: Tree (Either Int Age)
   e3 :: Tree (Either Age Age)

One possible syntax might be this:

   e1 = e |> Tree (Either MkAge Int)
   e2 = e |> Tree (Either Int MkAge)
   e3 = e |> Tree (Either MkAge MkAge)

Here I am using the data constructor MkAge in the middle of a type to say "make the change here!". It's less convenient in the other direction

   e4 :: Tree (Either Int Age)
   e4 = e3 |> Tree (Either unAge Age)

THere's a record selector unAge in the middle there. Maybe we need a more explicit signal that we are dropping from the type world to the term world, something like

  e4 = e3 |> Tree (Either |unAge| Age)

Simon

  Changed 8 months ago by nomeata

I find that I disagree. Consider a library that uses newtypes to hide the implementation:

newtype Nat = Nat { natToInteger :: Integer }

safeNat x = if x >= 0 then Nat x else error "negative"
...more operations on nat...

now the user of the library has a value l :: [Nat] around, but wants to go back to Integers. The natural way of doing it is map natToInteger l. And I find it reasonable to have the compiler (with the help of the library writer) to optimize that to a cast.

The programmer cannot be the one to have to think about it, because he might not even know that Nat ist but a newtype. (Although it might be documented somewhere, so he might be expecting zero-overhead over Ints, and hence expecting the map to be free.)

With my proposition, the burdon of writing the rule is not even with the library author, but the author of the data structure, in this case the list. I find it reasonable to expect him to worry about such things, it is no less tricky than using RULES for list fusion.

If I read your suggestion in comment:23 correctly, you want to introduce a type cast operator to the Haskell syntax that checked whether the types really have the same representation? Might also be useful, but isn’t this risky? We would not want the user to be able to write

let x :: Map Int = ....
    y = x |> Map (Down Int)

would we? How would that be prevented?

  Changed 8 months ago by nomeata

Thanks for bearing with my spam. I found another reason why a programmer might already expect this to work via RULES, and an indication to a possible implementation. Consider this code:

newtype X = X Int
b :: [Int] -> [X]
b = map X
c :: [Int] -> [X]
c = map unsafeCoerce#

Both functions generate almost identical core code, they even share the same „identitiy“ function:

Test.b1 :: GHC.Types.Int -> GHC.Types.Int
[GblId,
 Arity=1,
 Caf=NoCafRefs,
 Str=DmdType S,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=ALWAYS_IF(unsat_ok=True,boring_ok=True)}]
Test.b1 = \ (tpl_B1 :: GHC.Types.Int) -> tpl_B1

Test.b :: [GHC.Types.Int] -> [Test.X]
[GblId,
 Arity=1,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [] 20 60}]
Test.b =
  GHC.Base.map
    @ GHC.Types.Int
    @ Test.X
    (Test.b1
     `cast` (<GHC.Types.Int> -> Sym (Test.NTCo:X)
             :: (GHC.Types.Int -> GHC.Types.Int) ~# (GHC.Types.Int -> Test.X)))

Test.c :: [GHC.Types.Int] -> [Test.X]
[GblId,
 Arity=1,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [] 20 60}]
Test.c =
  GHC.Base.map
    @ GHC.Types.Int
    @ Test.X
    (Test.b1
     `cast` (<GHC.Types.Int> -> UnsafeCo GHC.Types.Int Test.X
             :: (GHC.Types.Int -> GHC.Types.Int) ~# (GHC.Types.Int -> Test.X)))

The only difference is how the coercion is being constructed. Now a the author of the list data type might have added this rule:

{-# RULES
"map/coerce" map unsafeCoerce# = unsafeCoerce#
  #-}

Then assuming the rule fires on c (and I could swear that it did just an hour ago, but I cannot reproduce it now, it seems that now the inlining of unsafeCoerce# happens too soon), then would it not make sense to have it also fire on b, giving us the desired result?

I tried to create a quick hack that would unfold unsafeCoerce# on the LHS of a rule so that the "map/coerce" rule would fire on both the inlined c as well as on b, but my GHC foo is not strong enough yet.

Nevertheless I think letting unsafeCoerce# in a RULE match also functions known to be just specializations of it (namely newtype constructors and deconstructors) seems to be a reasonably clean way to achieve this, without exposing any Core details (casts, equality types) to the surface syntax.

follow-up: ↓ 27   Changed 8 months ago by simonpj

Your previous comment ("I find I disagree"). But I don't see a good solution at the moment:

  • I really do not want anything relying on unsafeCorece. There is nothing ill-typed going on here, and it would be horrible to subert the type system to do something the (FC) type system can express perfectly well.
  • I really do not want to magically infer that a function is map-like. That seems terribly fragile to me (what about mutual recursion, types with multiple parameters, non-uniform recursion etc). At most one could make a special case for lists.

Generally the GHC story has been, lacking a good way forward, to wait until someone has a neat idea, rather than to implement a hack that we later regret. That's where I am on this one at the moment.

Simon

in reply to: ↑ 26   Changed 8 months ago by nomeata

Hi,

Replying to simonpj:

Your previous comment ("I find I disagree").

it seems that this sentence was truncated...

But I don't see a good solution at the moment: * I really do not want anything relying on unsafeCorece. There is nothing ill-typed going on here, and it would be horrible to subert the type system to do something the (FC) type system can express perfectly well.

The problems seems to me that FC is able to express it nicely, but Haskell is not. But RULES are being written by Haskell programmers. Now there are two alternatives:

  • expose FC features (cast, coercions) to the Haskell programer, even if only when specifying RULES, or
  • find a reasonable and – for the Haskell programmer – intuitive approximation to what should be there that is still Haskell.

I was currently exploring the second path. For that, there needs to be a way to match newtype constructors and deconstructors, without having to talk about type casts. A LHS map unsafeCoerce would fit somewhat, but maybe this placeholder is misleading. One could introduce a new name for this purpose, say map (casted id) or map thisIsANewtypeConstructorOrDeconstructor or any other new name. This is probably the easier part.

But what to put on the RHS? You want that in the FC cast that will be there, the equality is constructed from the equality on the LHS. I feel that this is not possible without exposing FC features to the Haskell level – hence my suggestion to use unsafeCoerce on the RHS. One could also try a different name there, but it would not gain much.

I am uncomfortable with the other approach of exposing FC casts and stuff to the Haskell user. Where else has this been done before? One may argue that people who write RULES already have to know how to read core, otherwise they will never know why their rules work or not... but I don’t believe this is wanted.

* I really do not want to magically infer that a function is map-like. That seems terribly fragile to me (what about mutual recursion, types with multiple parameters, non-uniform recursion etc). At most one could make a special case for lists.

I agree that this wasn’t a very good idea, and anyways such an automatic RULE generation would be independent of the issue at hand.

Generally the GHC story has been, lacking a good way forward, to wait until someone has a neat idea, rather than to implement a hack that we later regret. That's where I am on this one at the moment.

I agree. I will try to come up with ideas until I run out of them or a neat one appears, if you don’t mind :-)

follow-up: ↓ 29   Changed 8 months ago by simonpj

I meant "Your previous comment makes good points"!

in reply to: ↑ 28   Changed 8 months ago by sweirich

Sorry to jump in late... but I agree with Simon. Exposing the FC cast operation to the user---just for newtypes---seems to be the simplest solution to the problem. If a client of the module doesn't know that Age is a newtype for Int, why would the client expect "map MkAge?" to be an identity function? Newtypes *already* have special semantics, so enhancing that seems better than introducing something that unpredictably applies.

Furthermore, what Simon is proposing is just convenient syntax for what is already possible with generalized newtype deriving. (And so should be viewed with the same suspicion.) For example, we can cast a list of types thus:

{-# LANGUAGE GeneralizedNewtypeDeriving, MultiParamTypeClasses #-}

class Castable a b where
    cast  :: c a -> c b
    cast' :: c b -> c a
instance Castable Int Int where
    cast  = id
    cast' = id
newtype Age = MkAge Int deriving (Castable Int)

x :: [Int]
x = [1 .. 10]

y :: [Age]
y = cast x

However, when the type to cast is not the last argument to the data structure, the process requires the definition of yet another newtype.

newtype E1 b a = E1 { unE1 :: Either a b }

w :: Either Int Int -> Either Age Int 
w = unE1 . cast . E1

Having this cast around may be convenient for library writers who want to freely coerce between abstract and concrete types (although it is no substitute for a real module system :-).

  Changed 8 months ago by nomeata

No new idea on the syntax, just a small note: If this gets resolved, the fmap that DerivingFunctor? generates should probably also automatically get such a RULE.

  Changed 4 months ago by liyang

  • cc hackage.haskell.org@… added
Note: See TracTickets for help on using tickets.