Ticket #669 (closed bug: invalid)

Opened 7 years ago

Last modified 5 years ago

negative indentation in Text.PrettyPrint.HughesPJ

Reported by: waldmann@… Owned by: thorkilnaur
Priority: normal Milestone: 6.10 branch
Component: libraries/pretty Version: 6.4.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Difficulty: Unknown
Test Case: pp1 Blocked By:
Blocking: Related Tickets:

Description

The pretty printing library has a long-standing bug: it sometimes creates negative indentation. Currently, the symptom is cured, but the cause is unknown.

I tried to dig a bit into the code, and here's what I found.

short summary: the law

(n6)  x <> nest k y = x <> y, if x non-empty

seems broken by the implementation.

test case:

this seems OK:

*Main> text "foo" <> nest 20 (  nest 20 ( text "foo" ) $$ text "bar" )
Beside (TextBeside (Str "foo") 3 Empty) False
       (Nest 40
             (TextBeside (Str "foo") 3
                         (NilAbove (Nest -23 (TextBeside (Str "bar") 3 Empty)))))

but thes seems wrong (the Nest 40 is gone but the Nest -23 is kept, this will lead to negative indentation later on)

*Main> reduceDoc $ text "foo" <> nest 20 (  nest 20 ( text "foo" ) $$ text "bar" )
TextBeside (Str "foo") 3
           (TextBeside (Str "foo") 3
                       (NilAbove (Nest -23 (TextBeside (Str "bar") 3 Empty))))

Now the formatter thinks that he has to indent by 3 + 3 - 23 = -17 spaces.

In the source, I found these problematic lines:

618:nilBeside g (Nest _ p) = nilBeside g p
657:sepNB g (Nest _ p)  k ys  = sepNB g p k ys
700:fillNB g (Nest _ p)  k ys  = fillNB g p k ys
862:        lay2 k (Nest _ p)          = lay2 k p

I think that no function (that produces a document) is allowed to ignore the nesting level because this adds to the total (relative) indentation that must be kept in a consistent state.

At least that's my guess that these numbers represent relative indentation - the source is not exactly verbose on the meaning of all these numbers (which are not in the original paper by John Hughes).

Attachments

NewPP3.hs Download (3.0 KB) - added by thorkilnaur 6 years ago.
Draft repaired version of the original Hughes NewPP.hs/Pretty library
HughesPJIndentRepair.patch Download (50.3 KB) - added by thorkilnaur 6 years ago.
Patch for draft repair of the HughesPJ library
pp1IndentRepair.patch Download (56.9 KB) - added by thorkilnaur 6 years ago.
Patch reflecting that the test pp1 passes
Document_HughesPJ_negative_indentation_behaviour_669.patch.gz Download (17.3 KB) - added by thorkilnaur 5 years ago.
Old patch for inspiration only

Change History

Changed 6 years ago by igloo

  • milestone set to 6.8

Perhaps we should look for a new maintainer for this library?

Changed 6 years ago by simonpj

See #1176 for another example of negative indentation happening in GHC's version of this library.

It would be great if someone would fix this libarary! And when they do, propagate the fixes to GHC's Pretty module.

Simon

Changed 6 years ago by thorkilnaur

  • owner set to thorkilnaur

Changed 6 years ago by thorkilnaur

  • testcase set to pp1

See #1062 for yet another example of this problem. I have transferred the test case pp1 from #1062.

Changed 6 years ago by thorkilnaur

This problem is also present in the original Hughes library  http://www.cs.chalmers.se/~rjmh/Software/NewPP.hs. An example is

$ ghc -e 'putStr $ pretty 2 2 $ text "a" <> sep [ nest 4 (text "b"), text "c" ]' NewPP
ab
c
$ 

where the c should have been printed below the b and not below the a. Instrumenting slightly

$ diff Newpp.hs Newpp2.hs
99c99
< layout k x = [' ' | i<-[1..k]] ++ layout' k x
---
> layout k x = "<" ++ show k ++ ">" ++ layout' k x
$ ghc -e 'putStr $ pretty 2 2 $ text "a" <> sep [ nest 4 (text "b"), text "c" ]' NewPP2
<0>ab
<-3>c
$ 

we can observe that the pretty-printer actually intends to print the c 3 positions to the left of the left margin.

The problem can be traced further back in the paper "The Design of a Pretty-printing Library" by John Hughes ( http://www.cs.chalmers.se/~rjmh/Papers/pretty.ps) where the original pretty-printing library is developed: Using the law

x <> nest k y = x <> y

from section "7.2 The Algebra of Layouts" and the relation

sep (nest k x : xs) = nest k (sep (x : map (nest (-k)) xs ))

derived in section "8 Implementing Pretty-printing: A Term Representation", we can derive

z <> sep (nest k x : xs)
        = z <> (nest k (sep (x : map (nest (-k)) xs )))
        = z <> (sep (x : map (nest (-k)) xs ))

which seems problematic in general. There is a clear correspondence between this relation and the behaviour of the library.

I have worked out a draft repair of the original library: Instead of sometimes removing nests completely, they are retained in the documents and simply ignored when required. In this way, the indent amount can be calculated correctly. Additional adjustmenst are needed to ensure that line widths remain correctly calculated.

The differences between the original and the repaired library NewPP3.hs are:

$ diff NewPP.hs NewPP3.hs
41d40
< Nil <> (Nest k x) = Nil <> x
74c73
< best w r (s `TextBeside` x) = s `TextBeside` best' w r s x
---
> best w r (s `TextBeside` x) = s `TextBeside` best' (w,w) r s x
80c79
< best' w r s (NilAbove x) = NilAbove (best (w-len s) r x)
---
> best' (w1,_) r s (NilAbove x) = NilAbove (best (w1-len s) r x)
83,85c82,84
< best' w r s (Nest k x) = best' w r s x
< best' w r s (x `Union` y) = 
<   nicest' w r s (best' w r s x) (best' w r s y)
---
> best' (w1,w2) r s (Nest k x) = Nest k (best' (w1-k,w2) r s x)
> best' w@(_,w2) r s (x `Union` y) = 
>   nicest' w2 r s (best' w r s x) (best' w r s y)
95a95
>                Nest k' y -> fits n k y
103a104
> layout' k (Nest k' x) = layout' (k+k') x
$ 

NewPP3.hs is attached.

A similar repair can be carried out for the HughesPJ library. A patch for this draft change (HughesPJIndentRepair.patch) is attached. In addition, a patch for the pp1 test case (pp1IndentRepair.patch) is attached.

Please note that I have not carried out an exhaustive test of the repaired HughesPJ library. For example, with nests now being retained more often in the documents, there could well be cases left in the library where a nest will cause a pattern match failure. Besides pp1, I have not been able to find test material related to the HughesPJ library. I would be most interested to hear about such material.

Thorkil

Changed 6 years ago by thorkilnaur

Draft repaired version of the original Hughes NewPP.hs/Pretty library

Changed 6 years ago by thorkilnaur

Patch for draft repair of the HughesPJ library

Changed 6 years ago by thorkilnaur

Patch reflecting that the test pp1 passes

Changed 6 years ago by thorkilnaur

In this investigation, I had earlier asked John Hughes about his comments to the situation, since the problem was related to his original library. With his permission, I quote his response:

Actually, the point you bring up was the intended behaviour of the 
library... at least, intended by me! One of the properties I wanted the 
library to enjoy was that wherever a document fragment was placed, the 
layout of the fragment would be the same. Now, in its vertical form,

sep [nest 4 (text "a"), text "b"]

places the a 4 spaces to the right of the b:

    a
b

The *relative* placement of the a and the b should remain the same, 
whether or not this document is placed to the right of another. Thus, 
for example,

text "foobar: " <>  sep [nest 4 (text "a"), text "b"]

SHOULD be laid out as

foobar: a
    b

to preserve the relationship between a and b. Of course, it follows that 
indentations can be negative if the "foobar:" label is replaced by a 
shorter one. This, I see as an error in the use of the library. Of 
course, an intermediate document may well contain negative 
indentation--it can still be pretty-printed correctly by applying nest k 
to it at the top level, for a suitable k.

Of course, one might discuss whether this design it the right one. An 
important principle for me was to start from a simple formal 
specification--the set-based model in my paper--and then follow what it 
told me. So if one were to explore alternative designs, one should start 
not from the code, but from the formal model, and see whether a design 
with a similarly simple formal model can be found.

John

So I will have to say that my repair and patch is certainly not in accordance with the intentions of John Hughes.

I do not believe that I can contribute much further to this matter without input as to the direction it should take, so I will put it down for now.

Changed 6 years ago by thorkilnaur

  • owner thorkilnaur deleted

Changed 6 years ago by simonpj

That's interesting!

So a valid way forward would be

  • First, to document this point in the Haddock docs for 'nest', perhaps giving the examples that John does.
  • Second, to arrange that in the erroneous situation that a user asks for negative indentation, the library uses zero rather than looping. And to document this point.

Simon

Changed 6 years ago by thorkilnaur

  • owner set to thorkilnaur

Thanks. I'll see what I can do.

Changed 6 years ago by simonmar

Thorkil: any progress on this one? What should we milestone it as?

Changed 6 years ago by simonmar

  • component changed from hslibs/text to libraries/pretty

Changed 6 years ago by thorkilnaur

  • milestone changed from 6.8 branch to 6.10 branch

I'll do this (documentation change) after #1337.

Changed 5 years ago by benedikt

To summarize:

If we want to keep the law "If we compose two layouts a and b using a <g> b, then the layouts of a and b are preserved" this bug can be marked as invalid.

Demonstrated on torkil's example

-- move is the name of the DT constructor Nest (which behaves as 
-- if it would move the virtual typewriter's head)

let a = text "a"
let b = nest 4 (  nest 4 ( text "b" ) $$ text "bar" )

a ==> text "a"   --> 
|a|

b ==> nest 4 (nest 4 (text "b") $$ text "bar")
  ==> move 8; text "b"; nilabove; move -7; text "bar"; empty -->
|........b|
|....bar|

a <> b ==> text "a"; text "b"; nilabove; move -5; text "bar"; empty
      <==> (text "ab" $+$ (nest (-3) $ text bar))

nest 8 (a <> b)  -->
|........ab|
|.....bar|

(a <> b) -->
....|ab|
.bar|..|
    ^ left margin

So bar has to have a negative indentation of 3, when layouts should be preserved.

Suppose we want to introduce a combinator <#>, which has a non layout-preserving semantics. We would have to agree on a formal semantics.

One possibility would be to say that <#> is as $+$, but the last line of the first layout and the first line of the last layout are merged. So this definition of <#> "prepends text to a layout".

|a|
<#> (1)
|........b|
|....bar|
|baz.....|
-->
|ab|
|....bar|
|baz....|
------------------
|a|
<#> (2)
|b|
|.c|
-->
|ab|
|.c|

This could be useful sometimes.

Another possibility would be to state that <#> acts as <>, but ensures there is no negative indentation. This definition seems to have hard to predict effects.

let doc =
|a|
<#> (1)
|........b|
|....bar|
|baz.....|
-->
|ab|
|bar|
|baz|
------------------
nest 4 doc:
|....ab|
|....bar|
|....baz|

Neither of them seems really convincing to me. Any ideas ? A usecase for a non-layout preserving combinator would be great too.

best regards, benedikt

Changed 5 years ago by thorkilnaur

Thanks for taking this matter up again (#2393). For reference, I attach an old patch responding to the Simon PJ request for additional documentation of the nesting behaviour and some cleaning up of the comments related to negative indents. I don't expect this to be directly useful, but it may provide some inspiration.

With comments such as these added to the documentation, I agree that that ticket should be closed.

Best regards Thorkil

Changed 5 years ago by thorkilnaur

Old patch for inspiration only

Changed 5 years ago by benedikt

Thanks thorkil

I'll document another example using $$ (not present in the paper), which might be reported as a bug. The negative indentation causes what I would call "surprising behavior from the client's point of view".

show$ text "a" <> (nest 4 (text "b") $+$ text "c")) $$ text "d" ==>
|ab|
|c..d|

(text "a" <> (nest 4 (text "b") $+$ text "c")) $$ text "d" =
l1:   ....|ab...|
l2:   .c..|.....|
	      `overlap`
l3:   ....|d.....|
  ==    
l1:   ....|ab...|
l2:   .c..|d....|

The library behaves in a consistent way, but the client is probably surprised that (a $$ text "d") puts `d' in column 4.

The cleanest solution would be to make negative indentation a hard error - but this might break existing code, so I won't suggest it (maybe emit a warning ?)

Changed 5 years ago by igloo

  • status changed from new to closed
  • resolution set to invalid

Marking this as invalid, as per #2393.

Changed 5 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

Changed 5 years ago by simonmar

  • os changed from Unknown to Unknown/Multiple
Note: See TracTickets for help on using tickets.