********************************* Type Directed Search ``:search`` ********************************* Idris' ``:search`` command searches for terms according to their approximate type signature (much like `Hoogle `__ for Haskell). For example:: Idris> :search e -> List e -> List e = Prelude.List.(::) : a -> List a -> List a Cons cell = Prelude.List.intersperse : a -> List a -> List a Insert some separator between the elements of a list. > Prelude.List.delete : Eq a => a -> List a -> List a < assert_smaller : a -> b -> b Assert to the totality checker than y is always structurally smaller than x (which is typically a pattern argument) < Prelude.Basics.const : a -> b -> a Constant function. Ignores its second argument. The best results are listed first. As we can see, ``(::)`` and ``intersperse`` are exact matches; the ``=`` symbol to the left of those results tells us the types of ``(::)`` and ``intersperse`` are effectively the same as the type that was searched. The next result is ``delete``, whose type is more specific than the type that was searched; that's indicated by the ``>`` symbol. If we had a function with the signature ``e -> List e -> List e``, we could have given it the type ``Eq a => a -> List a -> List a``, but not necessarily the other way around. The final two results, ``assert_smaller`` and ``const``, have types more general than the type that was searched, and so they have ``<`` symbols to their left. For example, ``e -> List e -> List e`` would be a valid type for ``assert_smaller``. The correspondence for ``const`` is more complicated than any of the four previous results. ``:search`` shows this result because we could change the order of the arguments! That is, the following definition would be legal: .. code:: idris f : e -> List e -> List e f x xs = const xs x About :search results --------------------- :search's functionality is based on the notion of type isomorphism. Informally, two types are isomorphic if we can identify terms of one type exactly with terms of the other. For example, we can consider the types ``Nat -> a -> List a`` and ``a -> Nat -> List a`` to be isomorphic, because if we have ``f : Nat -> a -> List a``, then ``flip f : a -> Nat -> List a``. Similarly, if ``g : a -> Nat -> List a``, then ``flip g : Nat -> a -> List a``. With :search, we create a partial order on types; that is, given two types ``A`` and ``B``, we may choose to say that ``A <= B``, ``A >= B``, or both (in which case we say ``A == B``), or neither. For :search, we say that ``A >= B`` if all of the terms inhabiting ``A`` correspond to terms of ``B``, but it need not necessarily be the case that *all* the terms of ``B`` correspond to terms of ``A``. Here's an example: :: a -> a >= Nat -> Nat The left-hand type has just a single inhabitant, ``id``, which corresponds to the term ``id {a = Nat}``, which has the right-hand type. However, there are various terms inhabiting the right-hand type (such as ``S``) which cannot correspond with terms of type ``a -> a``. We can consider the partial order for ``:search`` to be, in some sense, inductively generated by several classes of "edits" which are described below. Possible edits ~~~~~~~~~~~~~~ Here is a simple approximate list of the edits that are possible in ``:search``. They are not entirely formal, and do not necessarily reflect the ``:search`` command's actual behavior. For example, the *argument application* rule may be used directly on arguments that are bound after other arguments, without using several applications of the *argument transposition* rule. - **Argument transposition** :: A : Type B : Type a : A, b : B |- M : Type ---------------------------------------------------------------------------- (x : A) -> (y : B) -> [x,y/a,b]M == (y : B) -> (x : A) -> [x,y/a,b]M Score: 1 point Example: :: a -> Vect n a -> Vect (S n) a == Vect n a -> a -> Vect (S n) a Note that in order for it to make sense to change the order of arguments, neither of the arguments' types may depend on the value bound by the other argument! - **Symmetry of equality** :: A = B : Type t : Type |- M : Type ---------------------------------------- [A = B/t]M == [B = A/t]M Score: 1 point Example: :: (x,y,z : Nat) -> x + (y + z) = (x + y) + z == (x,y,z : Nat) -> (x + y) + z = x + (y + z) Note that this rule means that we can flip equalities anywhere they occur (i.e., not only in the return type). - **Argument application** :: e : A |- M : Type y1 : T1, ..., yn : Tn |- x : A ----------------------------------------------------------------- (z : A) -> [z/e]M >= (y1 : T1) -> ... -> (yn :Tn) -> [x/e]M Score: <= : 3 points, >= : 9 points Examples: :: a -> a >= Nat -> Nat a -> a >= List e -> List e Vect k (Fin k) >= Vect 5 (Fin 5) Note that the ``n`` shown in the scheme above may be 0; that is, there are no Pi terms to be added on the right side. For example, that's the case for the first example shown above. This is probably the most important, and most widely used, rule of all. - **Type class application** :: C : Type -> TypeClass , y1 : T1, ..., yn : Tn |- A : Type, instance : C A , t : Type |- M : Type ----------------------------------------------------------------- C a => [a/t]M >= (y1 : T1) -> ... -> (yn : Tn) -> [A/t]M Score: <= : 4 points, >= : 12 points Examples :: Ord a => a >= Int Show (List e) => List e -> String >= Show a => List a -> String This rule is used by looking at the instances for a particular type class. While the scheme is shown only for single-parameter type classes, it naturally generalizes to multi-parameter type classes. This rule is particularly useful in conjunction with argument application. Again, note that the ``n`` in the scheme above may be 0. - **Type class introduction** :: t : Type |- M : Type C : Type -> TypeClass ----------------------------------------------- (t : Type) -> M >= C t => M Score: <= : 2 points, >= : 6 points Example: :: a -> a -> Bool >= Eq a => a -> a -> Bool Scoring and listing search results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When a type ``S`` is searched, the type is compared against the types of all of the terms which are currently in context. When :search compares two types ``S`` and ``T``, it essentially tries to find a chain of inequalities :: R1 R2 Rn Rn+1 S <= A1 <= ... <= An <= T using the edit rules listed above. It also tries to find chains going the other way (i.e., showing ``S >= T``) as well. Each rule has an associated score which indicates how drastic of a change the rule corresponds to. These scores are listed above. Note that for the rules which are not symmetric, the score depends on the direction in which the rule is used. Finding types which are more general that the searched typed (``S <= T``) is preferred to finding types which are less general. The score for the entire chain is, at minimum, the sum of the scores of the individual rules (some non-linear interactions may be added). The :search function tries to find the chain between ``S`` and ``T`` which results in the lowest score, and this is the score associated to the search result for ``T``. Search results are listed in order of ascending score. The symbol which is shown along with the search result reflects the type of the chain which resulted in the minimum score. Implementation of :search ------------------------- Practically, naive and undirected application of the rules enumerated above is not possible; not only is this obviously inefficient, but the two application rules (particularly *argument application*) are really impossible to use without context given by other types. Therefore, we use a heuristic algorithm that is meant to be practical, though it might not find ways to relate two types which may actually be related by the rules listed above. Suppose we wish to match two types, ``S`` and ``T``. We think of the problem as a non-deterministic state machine. There is a ``State`` datatype which keeps track of how well we've matched ``S`` and ``T`` so far. It contains: - Names of argument variables (Pi-bound variables) in either type which have yet to be matched - A directed acyclic graph (DAG) of arguments (Pi-bindings) for ``S`` and ``T`` which have yet to be matched - A list of typeclass constraints for ``S`` and ``T`` which have yet to be matched - A record of the rules which have been used so far to get to this point A function ``nextSteps : State -> [State]`` finds the next states which may follow from a given state. Some states, where everything has been matched, are considered final. The algorithm can be roughly broken down into multiple stages; if we start from having two types, ``S`` and ``T``, which we wish to match, they are as follows: 1. For each of ``S`` and ``T``, split the types up into their return types and directed acyclic graphs of the arguments, where there is an edge from argument A to argument B if the term bound in A appears in the type of B. The topological sorts of the DAG represent all the possible ways in which the arguments may be permuted. 2. For type ``T``, recursively find (saturated) uses of the ``=`` type constructor and produce a list of modified versions of ``T`` containing all possible flips of the ``=`` constructor (this corresponds to the *symmetry of equality rule*). 3. For each modified type for ``T``, try to unify the return type of the modified ``T`` with ``S``, considering arguments from both ``S`` and ``T`` to be holes, so that the unifier may match pieces of the two types. For each modified version of ``T`` where this succeeds, an initial ``State`` can be made. The arguments and typeclasses are updated accordingly with the results of unification. The remainder of the algorithm involves applying ``nextSteps`` to these states until either no states remain (corresponding to no path from ``S`` to ``T``) or a final state is found. ``nextSteps`` also has several stages: 4. Try to unify arguments of ``S`` with arguments of ``T``, much like is done with the return types. We work "backwards" through the arguments: we try matching all remaining arguments of ``S`` which lack outgoing edges in the DAG of remaining arguments (that is, the bound value doesn't appear in the type of any other remaining arguments) with the all of the corresponding remaining arguments of ``T``. This is done recursively until no arguments remain for both ``S`` and ``T``; otherwise, we give up at this point. This step corresponds to application of the *argument application rule*, as well as the *argument transposition* rule. 5. Now, we try to match the type classes. First, we take all possible subsets of type class constraints for ``S`` and ``T``. So if ``S`` and ``T`` have a total of ``n`` type class constraints, this produces ``2^n`` states for every state, and this quickly becomes infeasible as ``n`` grows large. This is probably the biggest bottleneck of the algorithm at the moment. This step corresponds to applications of the *type class introduction* rule. 6. Try to match type class constraints for ``S`` with those for ``T``. We attempt to unify each type class constraint for ``S`` with each constraint for ``T``. This may result in applications of the *type class application* rule. Once we are unable to match any more type class constraints between ``S`` and ``T``, we proceed to the final step. 7. Try instantiating type classes with their instances (in either ``S`` or ``T``). This corresponds to applications of the *type class application* rule. After instantiating a type class, we hopefully open up more opportunities to match typeclass constraints of ``S`` with those of ``T``, so we return to the previous step. The code for :search is located in the `Idris.TypeSearch module `__. Aggregating results ~~~~~~~~~~~~~~~~~~~ The search for chains of rules/edits which relate two types can be viewed as a shortest path problem where nodes correspond to types and edges correspond to rules relating two types. The weights or distances on each edge correspond to the score of each rule. We then may imagine that we have a single start node, our search type ``S``, and several final nodes: all of the types for terms which are currently in context. The problem, then, is to find the shortest paths (where they exist) to all of the final nodes. In particular, we wish to find the "closest" types (those with the minimum score) first, as we'd like to display them first. This problem nicely maps to usage of Dijkstra's algorithm. We search for all types simultaneously so we can find the closest ones with the minimum amount of work. In practice, this results in using a priority queue of priority queues. We first ask "which goal type should we work on next?", and then ask "which state should we expand upon next?" By using this strategy, the best results can be shown quickly, even if it takes a bit of time to find worse results (or at least rule them out). Miscellaneous Notes ------------------- Whether arguments are explicit or implicit does not affect search results.