瀏覽代碼

Termination algorithm for impl selection (#2687)

This proposal replaces the termination algorithm for `impl` selection. The previous algorithm relied on a recursion limit, which is counter to [our goal for predictability](/docs/design/generics/goals.md#predictability).

The replacement is to terminate if any `impl` declaration depends transitively on itself with a "strict superset" type as one of its parameters.

Co-authored-by: Richard Smith <richard@metafoo.co.uk>
Co-authored-by: Satakshi Garg <56358125+satakshigarg@users.noreply.github.com>
Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
josh11b 3 年之前
父節點
當前提交
31e907178f
共有 1 個文件被更改,包括 337 次插入0 次删除
  1. 337 0
      proposals/p2687.md

+ 337 - 0
proposals/p2687.md

@@ -0,0 +1,337 @@
+# Termination algorithm for impl selection
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/2687)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+    -   [Non-type arguments](#non-type-arguments)
+    -   [Proof of termination](#proof-of-termination)
+-   [Rationale](#rationale)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Measure complexity using type tree depth](#measure-complexity-using-type-tree-depth)
+    -   [Consider each type parameter in an `impl` declaration separately](#consider-each-type-parameter-in-an-impl-declaration-separately)
+    -   [Consider types in the interface being implemented as distinct](#consider-types-in-the-interface-being-implemented-as-distinct)
+    -   [Require some count to decrease](#require-some-count-to-decrease)
+    -   [Require non-type values to stay the same](#require-non-type-values-to-stay-the-same)
+
+<!-- tocstop -->
+
+## Abstract
+
+This proposal replaces the termination algorithm for `impl` selection. The
+previous algorithm relied on a recursion limit, which is counter to
+[our goal for predictability](/docs/design/generics/goals.md#predictability).
+
+The replacement is to terminate if any `impl` lookup performed while considering
+an `impl` declaration depends transitively on the same `impl` declaration with a
+"strict superset" of the types in the query.
+
+## Problem
+
+Consider this `impl` declaration:
+
+```
+interface I;
+impl forall [T:! type where Optional(.Self) is I] T as I;
+```
+
+A type like `i32` is a valid value of `T`, and so implements `I`, if
+`Optional(i32)` implements `I`. This `impl` declaration could also possibly be
+used to give an implementation of `Optional(i32)` for interface `I`, but only if
+there was an implementation of `Optional(Optional(i32))` for interface `I`. The
+job of the termination rule is to report an error instead of being caught in an
+infinite loop in this situation.
+
+Ideally, a termination rule would identify the loop in a minimal way. This has a
+few benefits, including reducing compile times and making error messages as
+short and understandable as possible. One downside of the original recursion
+limit rule is its tendency to only detect a problem after the loop had been
+repeated many times. This problem is worse if the recursion limit is large.
+
+Another concern with using a recursion limit is that refactorings that would
+otherwise be legal can increase the depth of recursion, causing spurious
+failures. The workaround for this and other spurious failures is to increase the
+recursion limit. This makes the other problems with using a recursion limit
+worse.
+
+Note that determining whether a particular set of `impl` declarations terminates
+is
+[equivalent to the halting problem](https://sdleffler.github.io/RustTypeSystemTuringComplete/)
+(content warning: contains many instances of an obscene word as part of a
+programming language name), and so is undecidable in general. So any termination
+rule will have some false positives or spurious failures, where it reports an
+error even though it would in fact complete if allowed to continue running. We
+would like a criteria that correctly classifies the examples that arise in
+practice.
+
+## Background
+
+The first termination rule was introduced in proposal
+[#920: Generic parameterized impls (details 5)](https://github.com/carbon-language/carbon-lang/pull/920),
+following Rust and C++. The problems with using a recursion limit were
+[recognized at the time that proposal was written](https://github.com/carbon-language/carbon-lang/blob/f282bca20e41e2f8dc05881d9d6b38213d6c6c87/docs/design/generics/details.md#termination-rule),
+but no alternative was known.
+
+Alternatives termination rules have since been discussed:
+
+-   in open discussion on
+    [2022-04-13](https://docs.google.com/document/d/1tEt4iM6vfcY0O0DG0uOEMIbaXcZXlNREc2ChNiEtn_w/edit#heading=h.cja3fkwzv9tr),
+    prompted by a question on
+    [#1088: Generic details 10: interface-implemented requirements](https://github.com/carbon-language/carbon-lang/pull/1088);
+    and
+-   in issue
+    [#2458: Infinite recursion during impl selection](https://github.com/carbon-language/carbon-lang/issues/2458),
+    which includes summaries of discussions including those on
+    [2023-02-07](https://docs.google.com/document/d/1gnJBTfY81fZYvI_QXjwKk1uQHYBNHGqRLI2BS_cYYNQ/edit?resourcekey=0-ql1Q1WvTcDvhycf8LbA9DQ#heading=h.9u2u6078figt).
+
+PR
+[#2602: Implement the termination algorithm for impl selection described in #2458](https://github.com/carbon-language/carbon-lang/pull/2602)
+implements the termination rule of this proposal in Explorer.
+
+## Proposal
+
+We replace the termination criteria with a rule that the types in the `impl`
+query must never get strictly more complicated when considering the same `impl`
+declaration again. The way we measure the complexity of a set of types is by
+counting how many of each base type appears. A base type is the name of a type
+without its parameters. For example, the base types in this query
+`Pair(Optional(i32), bool) impls AddWith(Optional(i32))` are:
+
+-   `Pair`
+-   `Optional` twice
+-   `i32` twice
+-   `bool`
+-   `AddWith`
+
+A query is strictly more complicated if at least one count increases, and no
+count decreases. So `Optional(Optional(i32))` is strictly more complicated than
+`Optional(i32)` but not strictly more complicated than `Optional(bool)`.
+
+This rule, when combined with
+[the acyclic rule](/docs/design/generics/details.md#acyclic-rule) that a query
+can't repeat exactly, [guarantees termination](#proof-of-termination). This rule
+is expected to identify a problematic sequence of `impl` declaration
+instantiations in a way that is easier for the user to understand. Consider the
+example from before,
+
+```
+interface I;
+impl forall [T:! type where Optional(.Self) is I] T as I;
+```
+
+This `impl` declaration matches the query `i32 impls I` as long as
+`Optional(i32) impls I`. That is a strictly more complicated query, though,
+since it contains all the base types of the starting query (`i32` and `I`), plus
+one more (`Optional`). As a result, an error can be given after one step, rather
+than after hitting a large recursion limit. And that error can state explicitly
+what went wrong: we went from a query with no `Optional` to one with one,
+without anything else decreasing.
+
+Note this only triggers a failure when the same `impl` declaration is considered
+with the strictly more complicated query. For example, if the declaration is not
+considered since there is a more specialized `impl` declaration that is
+preferred by the
+[type-structure overlap rule](/docs/design/generics/details.md#overlap-rule), as
+in:
+
+```
+impl forall [T:! type where Optional(.Self) is I] T as I;
+impl Optional(bool) as I;
+
+// OK, because we never consider the first `impl`
+// declaration when looking for `Optional(bool) impls I`.
+let U:! I = bool;
+
+// Error: cycle with `i32 impls I` depending on
+// `Optional(i32) impls I`, using the same `impl`
+// declaration, as before.
+let V:! I = i32;
+```
+
+The rule is also robust in the face of refactoring:
+
+-   It does not depend on the specifics of how an `impl` declaration is
+    parameterized, only on the query.
+-   It does not depend on the length of the chain of queries.
+-   It does not depend on a measure of type-expression complexity, like depth.
+
+## Details
+
+### Non-type arguments
+
+For non-type arguments we have to expand beyond base types to consider other
+kinds of keys. These other keys are in a separate namespace from base types.
+
+-   Values with an integral type use the name of the type as the key and the
+    absolute value as a count. This means integer arguments are considered more
+    complicated if they increase in absolute value. For example, if the values
+    `2` and `-3` are used as arguments to parameters with type `i32`, then the
+    `i32` key will have count `5`.
+-   Every option of a choice type is its own key, counting how many times a
+    value using that option occurs. Any parameters to the option are recorded as
+    separate keys. For example, the `Optional(i32)` value of `.Some(7)` is
+    recorded as keys `.Some` (with a count of `1`) and `i32` (with a count of
+    `7`).
+-   Yet another namespace of keys is used to track counts of variadic arguments,
+    under the base type. This is to defend against having a variadic type `V`
+    that takes any number of `i32` arguments, with an infinite set of distinct
+    instantiations: `V(0)`, `V(0, 0)`, `V(0, 0, 0)`, ...
+    -   A `tuple` key in this namespace is used to track the total number of
+        components of tuple values. The values of those elements will be tracked
+        using their own keys.
+
+Non-type argument values not covered by these cases are deleted from the query
+entirely for purposes of the termination algorithm. This requires that two
+queries that only differ by non-type arguments are considered identical and
+therefore are rejected by the acyclic rule. Otherwise, we could construct an
+infinite family of non-type argument values that could be used to avoid
+termination.
+
+### Proof of termination
+
+Let's call a (finite or infinite) sequence of type expressions "good" if no
+later element is strictly more complex than an earlier element, and no type
+expression is repeated. We would like to prove that any good sequence of type
+expressions with a finite set of keys is finite.
+
+We can restrict to good sequences that don't repeat any multiset of keys, since
+there are only a finite number of types with a given multiset of keys. Proof: If
+none of the types have a variadic parameter list, then there is at most one type
+for every distinct permutation of base types. If some types are variadic, then
+we can get a conservative finite upper bound by multiplying the number of
+distinct permutations by the number of different possible arity combinations.
+The number of arity combinations is finite since, ignoring non-type arguments,
+the total arity must equal the number of base types in the type minus 1.
+
+The proof of termination is by induction on the number `N` of distinct keys.
+
+-   If `N == 1`, then types map to a multiset of a single key, which can be
+    represented by the count of the number of times that key appears. That
+    number must be non-negative and decreasing in the sequence, and so the
+    length of the sequence is bounded by the value of the first element. So good
+    sequences with `N == 1` must be finite.
+-   Assuming that good sequences with `N` distinct keys must be finite, consider
+    a good sequence with `N+1` distinct keys. Its first element will be
+    represented by a non-negative integer `(N+1)`-tuple, `(i_0, i_1, ..., i_N)`.
+    Every element after that will be in at least one of the
+    `i_0 + i_1 + ... + i_N` hyperplanes (co-dimension 1) given by these
+    equations:
+    -   `x_0 = 0`, `x_0 = 1`, ..., `x_0 = i_0 - 1` (`i_0` different equations,
+        each defining a separate hyperplane)
+    -   `x_1 = 0`, `x_1 = 1`, ..., `x_1 = i_1 - 1`
+    -   ...
+    -   `x_N = 0`, `x_N = 1`, ..., `x_N = i_N - 1`
+-   Any point not in one of those hyperplanes has components all >= the first
+    element, and so can't be in the sequence if it is good.
+-   The restriction of the sequence to the subsequence in each of those
+    hyperplanes is finite, by the induction hypothesis.
+-   The sequence visits points in this finite union of finite sets without
+    repetition, and so must be finite.
+-   Conclusion: Any good sequence with `N+1` distinct keys is finite, completing
+    the induction.
+
+This bound given by this construction is not completely tight, since there is
+overlap between the hyperplanes. It is tight once that overlap is taken into
+account, though. We can construct sequences that reach the upper bound by
+visiting the points in the union of the hyperplanes in descending order of their
+L1-norm (sum of the components).
+
+Note: The text of this argument was derived from comments on
+[issue #2458: Infinite recursion during impl selection](https://github.com/carbon-language/carbon-lang/issues/2458).
+
+## Rationale
+
+This proposal advances these [Carbon goals](/docs/project/goals.md):
+
+-   [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
+    by improving the quality of diagnostics.
+-   [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
+    since we've chosen alternatives that avoid introducing failures as the
+    result of refactorings, particularly those outside the files changed in the
+    refactoring.
+-   [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
+    by having a relatively simple language rule, that is predictable when
+    authoring code, and allows independent modules to compose without triggering
+    errors.
+
+## Alternatives considered
+
+### Measure complexity using type tree depth
+
+We
+[considered](https://github.com/carbon-language/carbon-lang/issues/2458#issuecomment-1371412985)
+a rule which would ensure termination by forbidding the depth of the type tree
+in the query from increasing. This depth could either be measured in the query
+or in the values of types used to parameterize `impl` declaration. Either way,
+this raises a concern that otherwise safe refactorings might trigger spurious
+termination errors. Specifically, refactorings that replace a type, like
+`String`, with an alias to a parameterized type, like `BasicString(Char8)`,
+could change the tree depths in `impl` declarations in files that were not part
+of the refactoring.
+
+### Consider each type parameter in an `impl` declaration separately
+
+Instead of measuring the complexity of the `impl` query as a whole, we
+considered measuring the complexity of the argument values of parameters in an
+individual `impl` declaration. The advantage of this would be fewer spurious
+failures due to the termination rule.
+
+We decided against it because it is a more complex rule and it is sensitive to
+the specifics of how `impl` declarations are parameterized. This raises concerns
+about refactorings introducing termination rule failures. We did not want to
+incorporate this change without evidence that those spurious failures would be a
+problem in practice.
+
+### Consider types in the interface being implemented as distinct
+
+Instead of measuring the complexity of the entire `impl` query together, we
+could consider keys in the type and interface parts of the query to be in
+distinct namespaces. This would reduce the spurious failuers due to the
+termination rule, but not as much as the previous alternative. It avoids the
+problem of the previous alternative, since it is not sensitive to the specifics
+of how `impl` declarations are parameterized.
+
+We are not choosing this alternative now since it is a more complicated rule to
+explain. But we would consider this alternative in the future if we find it
+beneficial in practice to support the additional cases this rule permits.
+
+### Require some count to decrease
+
+We considered a rule that would forbid repeating the multiset of types. This
+would simplify the termination argument. However, we thought it important to
+support `impl` declarations that effectively shuffled the terms around into some
+canonical form, as in this example:
+
+```
+impl forall [T:! I(Optional(.Self))] Optional(T) as I(T);
+```
+
+Here, `Optional(bool) impls I(bool)` if `bool impls I(Optional(bool))`. This
+rule can only be applied a finite number of times, and is something we imagine
+might arise naturally, so it seemed good to support.
+
+### Require non-type values to stay the same
+
+We considered different handling for
+[non-type argument values](#non-type-arguments) that did not have an integral or
+choice type. The alternative rule required the value to stay constant. This led
+to
+[a number of edge cases](https://github.com/carbon-language/carbon-lang/pull/2687#discussion_r1151028867)
+to consider, like how to identify the same argument when the type constructor
+may be called a different number of times. The rule we chose to use instead has
+the advantages of being simpler and also accepting more cases. With the current
+rule, the value of those arguments may change freely, they just don't create
+different type expressions for purposes of detecting termination.