Suggestion for how operator fixity should be specified ------------------------------------------------------ Nils Anders Danielsson (By fixity I mean associativity and precedence.) The current scheme is a mess. With Unicode symbols and mixfix operators users (such as myself) tend to define more operators than in Haskell, and then the Haskell fixity handling is too limited. It is very hard to get an overview over a total ordering which specifies how tight every operator binds in comparison to every other. This note describes a way to avoid these problems. The solution is not perfect--some limitations are discussed towards the end--but it is quite lightweight, so it should be relatively easy to implement and try out. New approach ------------ Associativity can be specified just as before. An operator is either left associative, right associative, or nonassociative. (Note that only infix operators can be left or right associative; pre- and postfix operators are always nonassociative.) The basic idea of the new approach to precedence handling is to abandon the current total order and instead have a partial order of precedences. This is an old idea, which is easy to understand. The basic difference compared to the current scheme is that two operators of noncomparable precedence cannot be used next to each other without inserting parentheses. The only crux is to find a good way of specifying the precedences. I believe that it is a good idea if the precedence of an operator can be understood locally, so I suggest that one should only be allowed to specify precedences at the binding site (in the defining module) of an operator, conservatively extended when new operators are defined. (This rules out having "first class precedences", where the precedences of an operator are free-standing entities which can be exported and imported separately from the operators themselves.) Precedences are defined for an operator • by relating it to previously defined operators. This can be done in three ways: * infix[lr ] • binds as ∘ This means that • (which is left, right or nonassociative) binds in exactly the same way as ∘. * infix[lr ] • binds tighter-than (op₁…) looser-than (op₂…) This means that • binds strictly tighter than op₁… and strictly looser than op₂…. The two parts tighter-than (…) and looser-than (…) can be given in any order, and one of them can be omitted. This declaration is only valid if it does not change the relations of any previously declared operators, i.e., if the precedence relation before this declaration is denoted by ⊰ and the one after this declaration by ⊰′, then the following property must hold: ∀ op₁ ≠ •. ∀ op₂ ≠ •. op₁ ⊰ op₂ ⇔ op₁ ⊰′ op₂. This property ensures some degree of locality for the precedences: To see if/how two operators are related it is enough to inspect the fixity declarations of these two operators, plus those of the operators referred to in these declarations (transitively). It is impossible for an unrelated fixity declaration to change this relation. * infix[lr ] • With this declaration • becomes unrelated to all other operators. * No fixity declaration for • is the same as specifying "infix •", i.e. • becomes nonassociative and unrelated to all other operators. It should also be possible to combine the fixity declarations of several operators, for instance as follows: infixl _op₁_ _op₂_ _op₃ binds looser-than (_+_) This is equivalent to the following three declarations: infixl _op₁_ binds looser-than (_+_) infixl _op₂_ binds as _op₁_ infixl _op₃ binds as _op₁_ (Note the use of _op₁_ in the last two declarations.) Some minor details ------------------ Some minor details (as compared to the current fixity handling in Agda): * Non-operator (function) symbols should still bind tighter than everything else. * Fixity declarations should of course be scope checked, and an error given if a fixity declaration is given for an operator which is not in scope. * It should be possible to give fixity declarations to record fields, for instance as follows: infix Setoid._≈_ binds as _≡_ infixl Ring._-_ binds as Ring._+_ * I do not like the fact that, for operators of the same precedence, the following sub-order of precedence is used: * postfix * prefix * infix right associative * infix left associative * infix nonassociative As an example, take the following fixity declarations: infixr 6 _∷_ infixl 6 _+_ infix 6 -_ _! Currently they result in 5 + 6 ∷ [], - 5 + 6 and - 2 ! parsing as (5 + 6) ∷ [], (- 5) + 6 and - (2 !), even though these expressions should not parse at all. Limitations ----------- The scheme outlined above has a limitation, demonstrated by the following example: Let us say that two libraries, one for sets and one for arithmetic, are developed independently. It is probably unreasonable (if one wants to keep unrelated code separate) for one of these libraries to depend on the other. Hence expressions such as the following won't parse: a + b ∈ c. Parentheses will have to be used: (a + b) ∈ c. To me this example is not very convincing, though. If the two libraries are really separate, then there should not be _any_ connection between them. If, on the other hand, it is a requirement that _+_ should really bind tighter than _∈_, then the libraries are not unrelated, but one should import the other. Another possible problem with the scheme outlined above is its implementation. Currently mixfix operator parsing is implemented in Agda (more or less) as follows: * Expressions are first parsed as if every operator was a function symbol. This yields parse trees (rose trees) which need to be post-processed. * Operator parsing is then done as part of scope-checking. For every symbol sequence (list in the rose tree) in a parsed expression a dedicated parser is generated based on which operator parts are present in the sequence. Scope information is needed for this step, since the relative precedences of the operators and also their associativity are used to construct the parser. The symbol sequence is then parsed using this dedicated parser. It is currently unclear whether this implementation method can be made efficient for the fixity handling scheme outlined above. Conclusion ---------- If the implementation can be made efficient, then I believe that the scheme outlined above is strictly better than the one we have. It is also easy to understand. In other words, I will start thinking about the implementation. ------------------------------------------------------------------------ Improved syntax ------------------------------------------------------------------------ infix [ε|left|right] [ε|binds [as |looser than |tighter than |looser than tighter than |tighter than looser than ]] ------------------------------------------------------------------------ Refinement ------------------------------------------------------------------------ Ulf commented that the scheme above is too inflexible. If (the already existing) library A defines _+_, and library B defines _&_ (which is unrelated to _+_), then it is impossible to define _==_ in library C in such a way that _+_ binds tighter than _==_, which in turn binds tighter than _&_. In order to accommodate this, let us drop transitivity. Details (based on discussions with Ulf): Precedence relations are DAGs, whose nodes are annotated with sets of operators. Let node(•) be the node of operator • (if any), and let n₁ ⊰ n₂ mean that there is an edge from node n₁ to node n₂. Fixity declarations get the following meanings: • binds as ∘: • is added to node(∘). • binds looser than ∘₁ tighter than ∘₂: A new node annotated with {•} is added, plus an edge from node(∘₁) and an edge from node n for all n with n ⊰ node(∘₁), plus an edge to node(∘₂) and an edge to node n for all n with node(∘₂) ⊰ n. Note that this does not create any new dependencies between ∘₁ and ∘₂, but • inherits earlier dependencies. A precedence relation now gives rise to a context free grammar in the following way: * The top-level production is as follows: expr ∷= | ⋁ {n | n is a node in the graph} * For every node n the following productions are added: n ∷= prefix-op⁺ n↑ | n↑ postfix-op⁺ | n↑ infix-non-assoc-op n↑ | (n↑ infix-left-assoc-op)⁺ n↑ | n↑ (infix-right-assoc-op n↑)⁺ n↑ ∷= | ⋁ {n' | n ⊰ n'} x-op ∷= ⋁ {op-prod | op is an "x" operator annotating n} op-prod ∷= op₁ expr op₂ expr op₃ … op_n (where op_i are the name parts of the mixfix operator op) Note that if all operator name parts are unique, and s don't introduce any ambiguity, then the grammar is unambiguous. However, we don't want to require all operator name parts to be unique, since this can be rather inflexible. (Consider a DSEL containing both if_then_ and if_then_else_, for instance. Or the two operators ⟦_⟧_ and ⟦_⟧'_.) All ambiguous parses will be rejected, in many cases with an error message listing all possible parses: Ambiguous parse. Could mean any of the following: if x then (if y then a) else b if x then (if y then a else b) We expect there to be rather few cases of ambiguity. A large number of potentially ambiguous operators will make it harder to write syntactically correct programs, and programmers will presumably be reluctant to subject themselves to this situation. ------------------------------------------------------------------------ Sections ------------------------------------------------------------------------ We can also support sections. Some examples will outline how this can be accomplished: If we have _+_ : ... then 5 +_ and _+ 3 are sections. They stand for \x -> 5 + x and \x -> x + 3, respectively. Note that +_ becomes a postfix operator, and _+ a prefix operator. Note also that _+_ can be viewed as a section, and does not need to be treated as a special case. (The qualified variant M._+_ still needs special treatment, though.) All mixfix operators can be sectioned. For instance, if we have if_then_else_ : ... then if_then x else y stands for \b -> if b then x else y. Parsing of sections is accomplished by letting the lexer distinguish different uses of '_': * As a wildcard. * At the beginning of an operator. * In the middle of an operator. * At the end of an operator. The different uses can be distinguished by examining surrounding white space. ------------------------------------------------------------------------ Open questions ------------------------------------------------------------------------ * What is the sub-class of DAGs that the declarations introduced above can give rise to? Not all DAGs can be constructed in this way. Take •⟶•⟶•⟶•, for instance. Could this be overly limiting? * Does the order of the declarations matter? If it does, then the scheme should be changed, since otherwise we would have a non-declarative language for specifying fixities. (It would not be very nice if the relative precedence of two operators depended on in which order two modules were imported, for instance.) Order does not matter for this simple example: infix _≡_ infix left _+_ binds tighter than _≡_ infix ¬_ binds looser than _≡_ The declarations give rise to the following precedence graph: ╭─────╮ │ _+_ ├⟵╮ ╰──┬──╯ │ ↑ │ ╭──┴──╮ │ │ _≡_ │ │ ╰──┬──╯ │ ↑ │ ╭──┴──╮ │ │ ¬_ ├─╯ ╰─────╯ If the order of the declarations is changed to infix _≡_ infix ¬_ binds looser than _≡_ infix left _+_ binds tighter than _≡_ we still get the same graph. Is this generally true? ------------------------------------------------------------------------ Summary of important "correctness" criteria ------------------------------------------------------------------------ • Adding a new declaration should not change the relations between previously declared operators. • If declarations can be reordered, then the semantics must be independent of their order. ------------------------------------------------------------------------ A possible problem with the scheme above ------------------------------------------------------------------------ Consider the following (Agda-like) modules: module A where infix _*_ module B where import A infix _^_ binds tighter-than (_*_) module C where import A infix _+_ binds looser-than (_*_) module D where import B; import C In D, do we have node(_+_) ⊰ node(_^_)? If not, then order of declarations does (in some sense) matter, since putting the two declarations in the same module would lead to a different result. However, if we do have node(_+_) ⊰ node(_^_), then the relationship between the two operators is not fixed until they are brought into the same scope. Neither scenario feels appealing. ------------------------------------------------------------------------ Refinement of the refinement ------------------------------------------------------------------------ I am compelled to remove the transitivity emulation from fixity declarations. It is too hard to understand. To start with we can require the user to specify every relationship explicitly. If this should turn out to require too much work, then the following extensions seem promising: • One could invent some notation for specifying the fixity of several operator groups at once, for instance: infix (_+_ _-_) < (_*_ _/_) < (_^_) The different groups in this kind of declaration would be transitively related. • One could specify a /module/ in an operator list; this would stand for all the operators exported from the module (top-level, plus perhaps records). Note that this may be a bit coarse. If Agda's open public was more like Haskell's re-exports, then it would be easy to use the module system to package operators for inclusion in fixity declarations, though.