3 年之前 · 31e907178f
--- a/proposals/p2687.md
+++ b/proposals/p2687.md
@@ -0,0 +1,337 @@
 
				+# Termination algorithm for impl selection
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+[Pull request](https://github.com/carbon-language/carbon-lang/pull/2687)
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Abstract](#abstract)
			
 
				+-   [Problem](#problem)
			
 
				+-   [Background](#background)
			
 
				+-   [Proposal](#proposal)
			
 
				+-   [Details](#details)
			
 
				+    -   [Non-type arguments](#non-type-arguments)
			
 
				+    -   [Proof of termination](#proof-of-termination)
			
 
				+-   [Rationale](#rationale)
			
 
				+-   [Alternatives considered](#alternatives-considered)
			
 
				+    -   [Measure complexity using type tree depth](#measure-complexity-using-type-tree-depth)
			
 
				+    -   [Consider each type parameter in an `impl` declaration separately](#consider-each-type-parameter-in-an-impl-declaration-separately)
			
 
				+    -   [Consider types in the interface being implemented as distinct](#consider-types-in-the-interface-being-implemented-as-distinct)
			
 
				+    -   [Require some count to decrease](#require-some-count-to-decrease)
			
 
				+    -   [Require non-type values to stay the same](#require-non-type-values-to-stay-the-same)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Abstract
			
 
				+
			
 
				+This proposal replaces the termination algorithm for `impl` selection. The
			
 
				+previous algorithm relied on a recursion limit, which is counter to
			
 
				+[our goal for predictability](/docs/design/generics/goals.md#predictability).
			
 
				+
			
 
				+The replacement is to terminate if any `impl` lookup performed while considering
			
 
				+an `impl` declaration depends transitively on the same `impl` declaration with a
			
 
				+"strict superset" of the types in the query.
			
 
				+
			
 
				+## Problem
			
 
				+
			
 
				+Consider this `impl` declaration:
			
 
				+
			
 
				+```
			
 
				+interface I;
			
 
				+impl forall [T:! type where Optional(.Self) is I] T as I;
			
 
				+```
			
 
				+
			
 
				+A type like `i32` is a valid value of `T`, and so implements `I`, if
			
 
				+`Optional(i32)` implements `I`. This `impl` declaration could also possibly be
			
 
				+used to give an implementation of `Optional(i32)` for interface `I`, but only if
			
 
				+there was an implementation of `Optional(Optional(i32))` for interface `I`. The
			
 
				+job of the termination rule is to report an error instead of being caught in an
			
 
				+infinite loop in this situation.
			
 
				+
			
 
				+Ideally, a termination rule would identify the loop in a minimal way. This has a
			
 
				+few benefits, including reducing compile times and making error messages as
			
 
				+short and understandable as possible. One downside of the original recursion
			
 
				+limit rule is its tendency to only detect a problem after the loop had been
			
 
				+repeated many times. This problem is worse if the recursion limit is large.
			
 
				+
			
 
				+Another concern with using a recursion limit is that refactorings that would
			
 
				+otherwise be legal can increase the depth of recursion, causing spurious
			
 
				+failures. The workaround for this and other spurious failures is to increase the
			
 
				+recursion limit. This makes the other problems with using a recursion limit
			
 
				+worse.
			
 
				+
			
 
				+Note that determining whether a particular set of `impl` declarations terminates
			
 
				+is
			
 
				+[equivalent to the halting problem](https://sdleffler.github.io/RustTypeSystemTuringComplete/)
			
 
				+(content warning: contains many instances of an obscene word as part of a
			
 
				+programming language name), and so is undecidable in general. So any termination
			
 
				+rule will have some false positives or spurious failures, where it reports an
			
 
				+error even though it would in fact complete if allowed to continue running. We
			
 
				+would like a criteria that correctly classifies the examples that arise in
			
 
				+practice.
			
 
				+
			
 
				+## Background
			
 
				+
			
 
				+The first termination rule was introduced in proposal
			
 
				+[#920: Generic parameterized impls (details 5)](https://github.com/carbon-language/carbon-lang/pull/920),
			
 
				+following Rust and C++. The problems with using a recursion limit were
			
 
				+[recognized at the time that proposal was written](https://github.com/carbon-language/carbon-lang/blob/f282bca20e41e2f8dc05881d9d6b38213d6c6c87/docs/design/generics/details.md#termination-rule),
			
 
				+but no alternative was known.
			
 
				+
			
 
				+Alternatives termination rules have since been discussed:
			
 
				+
			
 
				+-   in open discussion on
			
 
				+    [2022-04-13](https://docs.google.com/document/d/1tEt4iM6vfcY0O0DG0uOEMIbaXcZXlNREc2ChNiEtn_w/edit#heading=h.cja3fkwzv9tr),
			
 
				+    prompted by a question on
			
 
				+    [#1088: Generic details 10: interface-implemented requirements](https://github.com/carbon-language/carbon-lang/pull/1088);
			
 
				+    and
			
 
				+-   in issue
			
 
				+    [#2458: Infinite recursion during impl selection](https://github.com/carbon-language/carbon-lang/issues/2458),
			
 
				+    which includes summaries of discussions including those on
			
 
				+    [2023-02-07](https://docs.google.com/document/d/1gnJBTfY81fZYvI_QXjwKk1uQHYBNHGqRLI2BS_cYYNQ/edit?resourcekey=0-ql1Q1WvTcDvhycf8LbA9DQ#heading=h.9u2u6078figt).
			
 
				+
			
 
				+PR
			
 
				+[#2602: Implement the termination algorithm for impl selection described in #2458](https://github.com/carbon-language/carbon-lang/pull/2602)
			
 
				+implements the termination rule of this proposal in Explorer.
			
 
				+
			
 
				+## Proposal
			
 
				+
			
 
				+We replace the termination criteria with a rule that the types in the `impl`
			
 
				+query must never get strictly more complicated when considering the same `impl`
			
 
				+declaration again. The way we measure the complexity of a set of types is by
			
 
				+counting how many of each base type appears. A base type is the name of a type
			
 
				+without its parameters. For example, the base types in this query
			
 
				+`Pair(Optional(i32), bool) impls AddWith(Optional(i32))` are:
			
 
				+
			
 
				+-   `Pair`
			
 
				+-   `Optional` twice
			
 
				+-   `i32` twice
			
 
				+-   `bool`
			
 
				+-   `AddWith`
			
 
				+
			
 
				+A query is strictly more complicated if at least one count increases, and no
			
 
				+count decreases. So `Optional(Optional(i32))` is strictly more complicated than
			
 
				+`Optional(i32)` but not strictly more complicated than `Optional(bool)`.
			
 
				+
			
 
				+This rule, when combined with
			
 
				+[the acyclic rule](/docs/design/generics/details.md#acyclic-rule) that a query
			
 
				+can't repeat exactly, [guarantees termination](#proof-of-termination). This rule
			
 
				+is expected to identify a problematic sequence of `impl` declaration
			
 
				+instantiations in a way that is easier for the user to understand. Consider the
			
 
				+example from before,
			
 
				+
			
 
				+```
			
 
				+interface I;
			
 
				+impl forall [T:! type where Optional(.Self) is I] T as I;
			
 
				+```
			
 
				+
			
 
				+This `impl` declaration matches the query `i32 impls I` as long as
			
 
				+`Optional(i32) impls I`. That is a strictly more complicated query, though,
			
 
				+since it contains all the base types of the starting query (`i32` and `I`), plus
			
 
				+one more (`Optional`). As a result, an error can be given after one step, rather
			
 
				+than after hitting a large recursion limit. And that error can state explicitly
			
 
				+what went wrong: we went from a query with no `Optional` to one with one,
			
 
				+without anything else decreasing.
			
 
				+
			
 
				+Note this only triggers a failure when the same `impl` declaration is considered
			
 
				+with the strictly more complicated query. For example, if the declaration is not
			
 
				+considered since there is a more specialized `impl` declaration that is
			
 
				+preferred by the
			
 
				+[type-structure overlap rule](/docs/design/generics/details.md#overlap-rule), as
			
 
				+in:
			
 
				+
			
 
				+```
			
 
				+impl forall [T:! type where Optional(.Self) is I] T as I;
			
 
				+impl Optional(bool) as I;
			
 
				+
			
 
				+// OK, because we never consider the first `impl`
			
 
				+// declaration when looking for `Optional(bool) impls I`.
			
 
				+let U:! I = bool;
			
 
				+
			
 
				+// Error: cycle with `i32 impls I` depending on
			
 
				+// `Optional(i32) impls I`, using the same `impl`
			
 
				+// declaration, as before.
			
 
				+let V:! I = i32;
			
 
				+```
			
 
				+
			
 
				+The rule is also robust in the face of refactoring:
			
 
				+
			
 
				+-   It does not depend on the specifics of how an `impl` declaration is
			
 
				+    parameterized, only on the query.
			
 
				+-   It does not depend on the length of the chain of queries.
			
 
				+-   It does not depend on a measure of type-expression complexity, like depth.
			
 
				+
			
 
				+## Details
			
 
				+
			
 
				+### Non-type arguments
			
 
				+
			
 
				+For non-type arguments we have to expand beyond base types to consider other
			
 
				+kinds of keys. These other keys are in a separate namespace from base types.
			
 
				+
			
 
				+-   Values with an integral type use the name of the type as the key and the
			
 
				+    absolute value as a count. This means integer arguments are considered more
			
 
				+    complicated if they increase in absolute value. For example, if the values
			
 
				+    `2` and `-3` are used as arguments to parameters with type `i32`, then the
			
 
				+    `i32` key will have count `5`.
			
 
				+-   Every option of a choice type is its own key, counting how many times a
			
 
				+    value using that option occurs. Any parameters to the option are recorded as
			
 
				+    separate keys. For example, the `Optional(i32)` value of `.Some(7)` is
			
 
				+    recorded as keys `.Some` (with a count of `1`) and `i32` (with a count of
			
 
				+    `7`).
			
 
				+-   Yet another namespace of keys is used to track counts of variadic arguments,
			
 
				+    under the base type. This is to defend against having a variadic type `V`
			
 
				+    that takes any number of `i32` arguments, with an infinite set of distinct
			
 
				+    instantiations: `V(0)`, `V(0, 0)`, `V(0, 0, 0)`, ...
			
 
				+    -   A `tuple` key in this namespace is used to track the total number of
			
 
				+        components of tuple values. The values of those elements will be tracked
			
 
				+        using their own keys.
			
 
				+
			
 
				+Non-type argument values not covered by these cases are deleted from the query
			
 
				+entirely for purposes of the termination algorithm. This requires that two
			
 
				+queries that only differ by non-type arguments are considered identical and
			
 
				+therefore are rejected by the acyclic rule. Otherwise, we could construct an
			
 
				+infinite family of non-type argument values that could be used to avoid
			
 
				+termination.
			
 
				+
			
 
				+### Proof of termination
			
 
				+
			
 
				+Let's call a (finite or infinite) sequence of type expressions "good" if no
			
 
				+later element is strictly more complex than an earlier element, and no type
			
 
				+expression is repeated. We would like to prove that any good sequence of type
			
 
				+expressions with a finite set of keys is finite.
			
 
				+
			
 
				+We can restrict to good sequences that don't repeat any multiset of keys, since
			
 
				+there are only a finite number of types with a given multiset of keys. Proof: If
			
 
				+none of the types have a variadic parameter list, then there is at most one type
			
 
				+for every distinct permutation of base types. If some types are variadic, then
			
 
				+we can get a conservative finite upper bound by multiplying the number of
			
 
				+distinct permutations by the number of different possible arity combinations.
			
 
				+The number of arity combinations is finite since, ignoring non-type arguments,
			
 
				+the total arity must equal the number of base types in the type minus 1.
			
 
				+
			
 
				+The proof of termination is by induction on the number `N` of distinct keys.
			
 
				+
			
 
				+-   If `N == 1`, then types map to a multiset of a single key, which can be
			
 
				+    represented by the count of the number of times that key appears. That
			
 
				+    number must be non-negative and decreasing in the sequence, and so the
			
 
				+    length of the sequence is bounded by the value of the first element. So good
			
 
				+    sequences with `N == 1` must be finite.
			
 
				+-   Assuming that good sequences with `N` distinct keys must be finite, consider
			
 
				+    a good sequence with `N+1` distinct keys. Its first element will be
			
 
				+    represented by a non-negative integer `(N+1)`-tuple, `(i_0, i_1, ..., i_N)`.
			
 
				+    Every element after that will be in at least one of the
			
 
				+    `i_0 + i_1 + ... + i_N` hyperplanes (co-dimension 1) given by these
			
 
				+    equations:
			
 
				+    -   `x_0 = 0`, `x_0 = 1`, ..., `x_0 = i_0 - 1` (`i_0` different equations,
			
 
				+        each defining a separate hyperplane)
			
 
				+    -   `x_1 = 0`, `x_1 = 1`, ..., `x_1 = i_1 - 1`
			
 
				+    -   ...
			
 
				+    -   `x_N = 0`, `x_N = 1`, ..., `x_N = i_N - 1`
			
 
				+-   Any point not in one of those hyperplanes has components all >= the first
			
 
				+    element, and so can't be in the sequence if it is good.
			
 
				+-   The restriction of the sequence to the subsequence in each of those
			
 
				+    hyperplanes is finite, by the induction hypothesis.
			
 
				+-   The sequence visits points in this finite union of finite sets without
			
 
				+    repetition, and so must be finite.
			
 
				+-   Conclusion: Any good sequence with `N+1` distinct keys is finite, completing
			
 
				+    the induction.
			
 
				+
			
 
				+This bound given by this construction is not completely tight, since there is
			
 
				+overlap between the hyperplanes. It is tight once that overlap is taken into
			
 
				+account, though. We can construct sequences that reach the upper bound by
			
 
				+visiting the points in the union of the hyperplanes in descending order of their
			
 
				+L1-norm (sum of the components).
			
 
				+
			
 
				+Note: The text of this argument was derived from comments on
			
 
				+[issue #2458: Infinite recursion during impl selection](https://github.com/carbon-language/carbon-lang/issues/2458).
			
 
				+
			
 
				+## Rationale
			
 
				+
			
 
				+This proposal advances these [Carbon goals](/docs/project/goals.md):
			
 
				+
			
 
				+-   [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
			
 
				+    by improving the quality of diagnostics.
			
 
				+-   [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
			
 
				+    since we've chosen alternatives that avoid introducing failures as the
			
 
				+    result of refactorings, particularly those outside the files changed in the
			
 
				+    refactoring.
			
 
				+-   [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
			
 
				+    by having a relatively simple language rule, that is predictable when
			
 
				+    authoring code, and allows independent modules to compose without triggering
			
 
				+    errors.
			
 
				+
			
 
				+## Alternatives considered
			
 
				+
			
 
				+### Measure complexity using type tree depth
			
 
				+
			
 
				+We
			
 
				+[considered](https://github.com/carbon-language/carbon-lang/issues/2458#issuecomment-1371412985)
			
 
				+a rule which would ensure termination by forbidding the depth of the type tree
			
 
				+in the query from increasing. This depth could either be measured in the query
			
 
				+or in the values of types used to parameterize `impl` declaration. Either way,
			
 
				+this raises a concern that otherwise safe refactorings might trigger spurious
			
 
				+termination errors. Specifically, refactorings that replace a type, like
			
 
				+`String`, with an alias to a parameterized type, like `BasicString(Char8)`,
			
 
				+could change the tree depths in `impl` declarations in files that were not part
			
 
				+of the refactoring.
			
 
				+
			
 
				+### Consider each type parameter in an `impl` declaration separately
			
 
				+
			
 
				+Instead of measuring the complexity of the `impl` query as a whole, we
			
 
				+considered measuring the complexity of the argument values of parameters in an
			
 
				+individual `impl` declaration. The advantage of this would be fewer spurious
			
 
				+failures due to the termination rule.
			
 
				+
			
 
				+We decided against it because it is a more complex rule and it is sensitive to
			
 
				+the specifics of how `impl` declarations are parameterized. This raises concerns
			
 
				+about refactorings introducing termination rule failures. We did not want to
			
 
				+incorporate this change without evidence that those spurious failures would be a
			
 
				+problem in practice.
			
 
				+
			
 
				+### Consider types in the interface being implemented as distinct
			
 
				+
			
 
				+Instead of measuring the complexity of the entire `impl` query together, we
			
 
				+could consider keys in the type and interface parts of the query to be in
			
 
				+distinct namespaces. This would reduce the spurious failuers due to the
			
 
				+termination rule, but not as much as the previous alternative. It avoids the
			
 
				+problem of the previous alternative, since it is not sensitive to the specifics
			
 
				+of how `impl` declarations are parameterized.
			
 
				+
			
 
				+We are not choosing this alternative now since it is a more complicated rule to
			
 
				+explain. But we would consider this alternative in the future if we find it
			
 
				+beneficial in practice to support the additional cases this rule permits.
			
 
				+
			
 
				+### Require some count to decrease
			
 
				+
			
 
				+We considered a rule that would forbid repeating the multiset of types. This
			
 
				+would simplify the termination argument. However, we thought it important to
			
 
				+support `impl` declarations that effectively shuffled the terms around into some
			
 
				+canonical form, as in this example:
			
 
				+
			
 
				+```
			
 
				+impl forall [T:! I(Optional(.Self))] Optional(T) as I(T);
			
 
				+```
			
 
				+
			
 
				+Here, `Optional(bool) impls I(bool)` if `bool impls I(Optional(bool))`. This
			
 
				+rule can only be applied a finite number of times, and is something we imagine
			
 
				+might arise naturally, so it seemed good to support.
			
 
				+
			
 
				+### Require non-type values to stay the same
			
 
				+
			
 
				+We considered different handling for
			
 
				+[non-type argument values](#non-type-arguments) that did not have an integral or
			
 
				+choice type. The alternative rule required the value to stay constant. This led
			
 
				+to
			
 
				+[a number of edge cases](https://github.com/carbon-language/carbon-lang/pull/2687#discussion_r1151028867)
			
 
				+to consider, like how to identify the same argument when the type constructor
			
 
				+may be called a different number of times. The rule we chose to use instead has
			
 
				+the advantages of being simpler and also accepting more cases. With the current
			
 
				+rule, the value of those arguments may change freely, they just don't create
			
 
				+different type expressions for purposes of detecting termination.