# `for` statement and user types [Pull request](https://github.com/carbon-language/carbon-lang/pull/1885) ## Table of contents - [Abstract](#abstract) - [Problem](#problem) - [Background](#background) - [Other languages](#other-languages) - [C++](#c) - [Python](#python) - [Rust](#rust) - [Typescript](#typescript) - [Go](#go) - [Proposal](#proposal) - [Details](#details) - [`for` with immutable values](#for-with-immutable-values) - [`Iterate` interface](#iterate-interface) - [`let` by default](#let-by-default) - [Lifetime of rvalues on right-hand side of `:`](#lifetime-of-rvalues-on-right-hand-side-of-) - [Use cases](#use-cases) - [Mutable and immutable elements](#mutable-and-immutable-elements) - [Random access containers](#random-access-containers) - [Maps, hash maps, and other containers](#maps-hash-maps-and-other-containers) - [R-value containers](#r-value-containers) - [Synthetic data](#synthetic-data) - [Large containers](#large-containers) - [Filtering, transforming, and reversing](#filtering-transforming-and-reversing) - [Interoperability with C++ types and containers](#interoperability-with-c-types-and-containers) - [Future work](#future-work) - [`for` with mutable values](#for-with-mutable-values) - [Speculative mixin usage](#speculative-mixin-usage) - [Mutable values](#mutable-values) - [Alternative views](#alternative-views) - [Large elements and Optional](#large-elements-and-optional) - [C++ interoperability](#c-interoperability) - [Iterating over C++ types in Carbon](#iterating-over-c-types-in-carbon) - [Iterating over Carbon types in C++](#iterating-over-carbon-types-in-c) - [Inversion of control](#inversion-of-control) - [Rationale](#rationale) - [Alternatives considered](#alternatives-considered) - [Atomic methods for `Iterate`](#atomic-methods-for-iterate) - [Using an iterator instead of a cursor](#using-an-iterator-instead-of-a-cursor) - [Support getter for both `T` and `T*` with `Iterate`](#support-getter-for-both-t-and-t-with-iterate) ## Abstract This proposes an interface that can be implemented by user types to support range-based iteration with `for`. ## Problem The current `for` proposal does not define a way for types to support range-based `for` loops. Examples of use cases for `for` loops include iterating over: - Mutable and immutable elements - Random access containers - Maps, hash maps, and complex containers - R-value containers - Synthetic data - Large containers, and large elements - Filtering, transforming, and reversing - Interoperability with C++ types and containers The goals for the solution includes: - Easy to support for user types, with minimal requirements - Simple and clear usage - Minimal performance overhead ## Background The original proposal of `for` loops were discussed in [p0353](/proposals/p0353.md), and the basic design documented in [control_flow/loops#for](/docs/design/control_flow/loops.md#for). The resulting syntax was: > `for (` _var declaration_ `in` _expression_ `) {` _statements_ `}` We now need a way to allow supporting this syntax for user and library container types. ### Other languages This section highlights some examples of how iterators and range-based for user-defined types is addressed in different languages. #### C++ C++ requires either `.begin()`/`.end()` methods, or `begin()`/`end()` to be supported for [range-based `for`](https://en.cppreference.com/w/cpp/language/range-for). #### Python Python requires 2 dunder methods, `__iter__` on the container and `__next__` on the iterator, to be usable with a `for in :` construct. [Generator functions](https://peps.python.org/pep-0255/) act like iterators. They are executed until the next `yield` statement, whose argument is returned as the next value to the `for` loop. Iteration ends when the function returns. ```python def generate_fibonacci(): yield 0; n1, n2 = 0, 1; while True: n1, n2 = n2, n1+n2; yield n1; for n in generate(): print(n); ``` #### Rust Rust supports iterating over a container with [for](https://doc.rust-lang.org/rust-by-example/flow_control/for.html) using - ranges `for n in 1..100` - iterators `for element in container.iter()` or `.iter_mut()` The `Iterator` type can be used with custom types using the [following syntax](https://doc.rust-lang.org/rust-by-example/trait/iter.html) (though specific): ```rust struct Fibonacci { curr: u32, next: u32, } impl Iterator for Fibonacci { type Item = u32; fn next(&mut self) -> Option { // calculate and yield value [...] } } for i in fibonacci().take(4) { ... } ``` #### Typescript Typescript uses the [`.iterator` property](https://www.typescriptlang.org/docs/handbook/iterators-and-generators.html#iterables), along with [two forms](https://www.typescriptlang.org/docs/handbook/iterators-and-generators.html#forof-statements) of range-based for loops: `for..in` and `for..of`. ```ts let pets = new Set(['Cat', 'Dog', 'Hamster']); pets['species'] = 'mammals'; for (let pet in pets) { console.log(pet); // "species" } for (let pet of pets) { console.log(pet); // "Cat", "Dog", "Hamster" } ``` Using the `.iterator` property, Typescript allow to lazily generate data easily using functions, classes, or objects: ```ts const reverse = (arr) => ({ [Symbol.iterator]() { let i = arr.length; return { next: () => ({ value: arr[--i], done: i < 0, }), }; }, }); ``` #### Go Go does not officially support customizing `for` for user types. ## Proposal Provide interfaces that can be implemented by user types to enable support for ranged-for loops. We propose: - A new `Iterate` interface to support for loops - Using a cursor-based approach - Making the `for` loop value `let` by default, that is, non-reassignable Below is a high-level example of this concept: ```carbon // Add support for `for` class MyRange { impl as Iterate where .ElementType = i32 and .CursorType = i32 { // Implement `Iterate` } } // Usage let my_range: MyRange = {...}; for (item: auto in my_range) { // Do something with `item` } ``` ## Details ### `for` with immutable values #### `Iterate` interface This proposal exposes a new `Iterate`. This interface: - Relies on a cursor-based approach: minimize implementation effort for developers, could support R-values. - Expects an `ElementType` and `CursorType`. - Combines "advance", "get", and "bounds check" into a single `Next(cursor)` method that returns an optional value. The `Iterate` interface is: ```carbon interface Iterate { let ElementType:! type; let CursorType:! type; fn NewCursor[self: Self]() -> CursorType; fn Next[self: Self](cursor: CursorType*) -> Optional(ElementType); } ``` A naive Carbon implementation of the for loop could be: ```carbon var cursor: range.(Iterate.CursorType) = range.(Iterate.NewCursor)(); var iter: Optional(range.(Iterate.ElementType)) = range.(Iterate.Next)(&cursor); // A. Possible implementation while (iter.HasValue()) { ExecuteForBlock(iter.Get()); iter = container.(Iterate.Next)(&cursor); } // B. Possible alternative depending on the API for Optional(T): while (true) { match (iter) { case .None => { break; } case .Some(let value: range.(Iterate.ElementType)) => { ExecuteForBlock(value); iter = container.(Iterate.Next)(&cursor); } } } ``` The cursor tracks progression, and the `Next` method advances the cursor and returns an `Optional` value. An empty `Optional` indicates that we have reached the end. Below is a sample implementation of that interface for a user type. ```carbon class MyIntContainer { var data: ... fn Size[self: Self]() -> i32 { return 4; } impl as Iterate where .ElementType = i32 and .CursorType = i32 { fn NewCursor[self: Self]() -> i32 { return 0; } fn Next[self: Self](cursor: i32*) -> Optional(i32) { // Advance and return value, or return empty if (*cursor < self.Size()) { let index: i32 = *cursor; *cursor = index + 1; return Optional(i32).Create(self.data[index]); } else { return Optional(i32).CreateEmpty(); } } } } ``` Which can be used as follow: ```carbon fn Main() -> i32 { var container: MyIntContainer = {.data = (1, 2, 3, 4)}; // Implicit `let` value declaration for (value: i32 in container) { Print("{0}", value); } return 0; } ``` #### `let` by default To keep usage simple and non-surprising, we propose defaulting variable declarations to `let` declarations. This is consistent with function parameters and scrutinees for [pattern matching](https://github.com/carbon-language/carbon-lang/pull/2188), which are immutable by default. ```carbon // Implicitly `let item: T` for (item: T in my_range) { // `item` cannot be reassigned } ``` This default can be overridden by adding the `var` keyword: ```carbon for (var item: T in my_range) { // `item` can be modified, but this will not modify `my_range` // because it is a copy of the element with its own storage } ``` #### Lifetime of rvalues on right-hand side of `:` It will be important to ensure that temporary entities on the right-hand side of `:` are alive during the execution of the `for` loop to prevent invalid memory access (Note: this was only recently addressed for C++ by [P2644](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2644r0.pdf)). This applies to Carbon and C++ interoperability. ```carbon // Example using a temporary range, with `GetElements()` returning an object that implements `Iterate` for (item: T in bucket.GetElements()) { // We need to make sure the object is alive for the duration of the `for` loop } ``` ### Use cases This section covers the use cases highlighted in the introduction, and usage with the proposed design. #### Mutable and immutable elements This proposal covers strictly immutable elements, and proposes defaulting element declaration to `let`, to minimize surprises and clarify the intent. Mutable elements are mentioned in the [Future work](#future-work) section. #### Random access containers For random access containers, the cursor would represent a numerical index, incremented within `Next()`. #### Maps, hash maps, and other containers Because the cursor type is defined by the developer, and opaque to the user, it could be a hashmap index, a string, or pointer to a node, without changing the usage for users. The `ElementType` can be a tuple, such as a `(key, value)` for maps, or a single value. See [Future work][#future-work] for other examples. #### R-value containers R-values are likely to be limited in terms of addressing and mutation. The impact is undefined until the [corresponding design](https://github.com/carbon-language/carbon-lang/pull/2006) is finalized. The cursor approach used for `Iterate` has the slight advantage of not requiring pointers (and addressing) for some container types. This may be beneficial as a first iteration, until the dependent design is updated, at which point we will have to define how to handle this special case. #### Synthetic data `Next()` can be trivially implemented to return a single computed value on-demand based on the cursor, without the need for storage. If the sequence is infinite, `Next()` will never return an empty `Optional`, and the `for` loop will continue until interrupted by the user. ```carbon class SquaredSequence { impl as Iterate where .ElementType = i32 and .CursorType = i32 { fn NewCursor[self: Self]() -> i32 { return -1; } fn Next[self: Self](cursor: i32*) -> Optional(i32) { // Advance and return computed value *cursor = *cursor + 1; return Optional(i32).Create((*cursor) * (*cursor)); } } } ``` #### Large containers The proposal does not present any limitation regarding large collections. The cursor can be adapted to the collection size, and no copy of the container should occur both for immutable collections, and future mutable elements. #### Filtering, transforming, and reversing Provided the traversal order is the same, generic filters and transformers are supported by implementing a proxy `Iterate` interface, internally or externally. This example illustrates a simple proxy interface. ```carbon class BasicFilter(T:! Iterate) { var seq: T*; var value: T.ElementType; impl as Iterate where .ElementType = T.ElementType and .CursorType = T.CursorType { fn NewCursor[self: Self]() -> T.ElementType { return self.seq->NewCursor(); } fn Next[self: Self](cursor: T.CursorType*) -> Optional(T.ElementType) { var res: auto = self.seq->Next(cursor); while (res.has_value() && res.get() == value) { res = self.seq->Next(cursor); } return res; } } } fn Main() -> i32 { let container: MyIntContainer = MyIntContainer.Create(0,1,0,2,3,0); let no_zeros: BasicFilter(MyIntContainer) = { .seq = &container, .value = 0 }; // Prints "123" for (v: i32 in no_zeros) { Print("{0}", v); } return 0; } ``` On the other hand, changing how the collection is traversed (for example reverse), requires internal knowledge about the data layout, and a custom implementation. One approach for containers that support reverse iteration would be to implement a different interface with a `Prev()` method. A proxy could be exposed to make it usable with `for` directly. Below is an example of what such an interface could look like: ```carbon interface IterateBidirectional; class ReverseProxy(T:! IterateBidirectional); interface IterateBidirectional { extends Iterate; fn NewEndCursor[self: Self]() -> CursorType; fn Prev[self: Self](cursor: CursorType*) -> Optional(ElementType); final fn Reversed[self: Self]() -> ReverseProxy(Self); } class ReverseProxy(T:! IterateBidirectional) { var container: T; impl as Iterate where .ElementType = T.ElementType and .CursorType = T.CursorType { fn NewCursor[self: Self]() -> CursorType { return self.container.NewEndCursor(); } fn Next[self: Self](cursor: CursorType*) -> Optional(ElementType) { return self.container.Prev(cursor); } } } fn IterateBidirectional.Reversed[self: Self]() -> ReverseProxy(Self) { return {.container = self}; } /// Usage class MyContainer { fn Create(...) { ... } impl as Iterate ... { ... } impl as IterateBidirectional ... { ... } } var container: MyContainer = MyContainer.Create(...); for (v: container.(Iterate.ElementType) in container.Reversed()) {...} ``` #### Interoperability with C++ types and containers The cursor approach deviates from the C++ iterator approach, requiring some adaptation. Given the current state of the design, C++ interoperability is yet to be defined, and suggestions have been highlighted in the [Future work](#future-work) section. ## Future work ### `for` with mutable values Supporting mutable values could be achieved using a second interface that provides a proxy or view for the data, exposing an `Iterate` interface with `ElementType*` ```carbon // Object implementing `MutableIterate` returned by a function for (p: T* in my_range.AddrRange()) {} ``` The example `MutableIterate` interface below acts as a proxy for the collection, and implements our `Iterate` interface. ```carbon interface MutableIterate { impl as Iterate; alias ElementType = Iterate.ElementType; let ProxyType:! Iterate where .ElementType = ElementType* and .CursorType is ImplicitAs((Self as Iterate).CursorType); fn AddrRange[addr self: Self*]() -> ProxyType; } class MyIntContainer { //... impl as Iterate ... external impl as MutateIterate where .ProxyType = Self { ... } } ``` ### Speculative mixin usage Mixins are currently in early design stages. This section highlights possible uses speculating on the final design. Some may include: - Improved semantics for views compared to getter methods: no direct side effect from using the view - Facilitate code reuse, compared to reimplementing an interface - Direct access to `self`, limiting needs for pointers and address resolution #### Mutable values Named mixins, used as programmable members, could expose a view with mutable values: ```carbon // Using a named mixin to expose mutable values for (p: my_range.(Iterate.ElementType)* in my_range.mutable) {} ``` Used in this way, this clarifies that this a view of the data, like an attribute, with no direct side effects, when compared to a method. #### Alternative views Similarly, mixins could allow supporting other views into the container. For example, maps could expose a view with "key" and "value" only, in addition to the default (key, value) tuple. ```carbon // Using a named mixin to expose mutable values for (k: my_dict.(Iterate.CursorType) in my_dict.keys) {} ``` And "reversed" view could also be used for reverse iteration: ```carbon // Reversing for (v: my_dict.(Iterate.ElementType) in my_dict.reversed) {} ``` ### Large elements and Optional Large elements would have a negative impact if the `Optional` cannot leverage unused bit patterns within the type, nor present a view instead of a copy. If this proves difficult, we can let implementers of the container choose whether to expose values or pointers to values. When electing to "pointers to values", an adapter can be provided to support transforming the range of pointers into a range a values. This choice allows supporting either iteration over container r-values (`ElementType = T`), or efficient iteration (`ElementType = T*`), but not both. While acceptable as an intermediate solution, a better alternative will be needed in the future. ```carbon adapter IterateByValue(T:! Iterate where .CursorType impls Deref) for T { impl as Iterate where .CursorType = T.CursorType and .ElementType = T.CursorType.(Deref.Result) { ... } } // Used like: var frames: NetworkFramesContainerByPtr = ... for (i: NetworkFrame in frames as IterateByValue(NetworkFramesContainerByPtr)) { ... } ``` Alternatively, we can allow the binding in the `for` loop to be an `addr` pattern, implemented by calling into a separate interface on the container. This puts the choice of using values or pointers to values in the hands of the `for` loop author. While this deviates from the ergonomics of `let` (where the compiler can choose to copy or alias the value depending on what is more efficient), this option would provide parity with C++ `for` loops (where the author chooses between value and reference). ### C++ interoperability #### Iterating over C++ types in Carbon For reference, range-based `for` loops in C++ requires: - `begin()` and `end()` methods or free functions, and - the type returned supports pre-increment `++`, indirection `*`, and inequality `!=` operations See [range-based for statement](https://eel.is/c++draft/stmt.iter#stmt.ranged) for more details. A limitation of the approach proposed in this document is the need for adaptation to interoperate with the C++ iterator model. The adapter would be a Carbon type that satisfies the Cursor interface, implemented in terms of a C++ iterator. - `NewCursor` providing the "start iterator" returned by `begin()` - `Next` doing the "increment, bounds check, and returning value" Two different approaches are identified: - One is to have a templated `impl` of `Iterate`, for anything with the right method signatures (`begin`, `end`, and so on) - Another is a templated adapter that implements `Iterate` when requested ```carbon // Template impl approach template constraint CppIterator { ... } template constraint CppContainer { // Speculative for_unique IteratorType: CppIterator; fn begin() -> IteratorType; fn end() -> IteratorType; } external impl forall [template T:! CppContainer] T as Iterate where .CursorType = T.(CppContainer.IteratorType) and .ElementType = .CursorType.ElementType { fn NewCursor[self: Self]() -> CursorType { return self.(CppContainer.begin)(); } fn Next[self: Self](cursor: CursorType*) -> Optional(ElementType) { if (*cursor == self.(CppContainer.end)()) { return Optional(ElementType).MakeEmpty(); } let value: ElementType = **cursor; ++*cursor; return Optional(ElementType).MakeValue(value); } } // Used like: var v: Cpp.std.vector(i32) = ... for (i: i32 in v) { ... } ``` ```carbon // Adapter approach adapter CppIterate(template T:! type) for T { impl as Iterate // Speculative syntax. Alternatively, a cursor type parameter // would avoid the need for deduction. where .CursorType = typeof(T.begin()) and .ElementType = .CursorType.(Deref.Result) { fn NewCursor[self: Self]() -> CursorType { return self.begin(); } fn Next[self: Self](cursor: CursorType*) -> Optional(ElementType) { if (*cursor == self.end()) { return Optional(ElementType).MakeEmpty(); } let value: ElementType = **cursor; ++*cursor; return Optional(ElementType).MakeValue(value); } } } // Used like: var v: Cpp.std.vector(i32) = ... for (i: i32 in v as CppIterate(Cpp.std.vector(i32))) { ... } ``` These two options could allow interoperability with a compatible C++ type, either implicitly (template approach) or explicitly (adapter approach). #### Iterating over Carbon types in C++ There need to be `begin()` and `end()` methods or free functions accessible to C++ for any Carbon type that implements the `Iterate` interface, in addition to overloads of the `*`, `++`, and `!=` operators for the types returned. The `Iterate` interface does not expose a way to have a cursor nor an iterator to the past-last element. An option would be to have `end()` return a predefined invalid value, to satisfy iterator comparison when `Next()` returns an empty `Optional`. Below is sample implementation to iterate on a Carbon `Iterate`. Note that we need to copy and cache the last value, to allow for `*it` to be called. This puts a requirement on `ElementType` to be copyable for this interoperability to be usable. ```cpp namespace Carbon { // A C++ input iterator for any type that implements the Carbon `Iterate` interface. template class IterateIterator { public: // Returns an "Invalid" iterator representing `end()` static auto Invalid(const Iterate& container) -> IterateIterator { return IterateIterator(container, std::nullopt); } IterateIterator(const Iterate& container, std::optional cursor) : container_(&container), cursor_(cursor) { if (cursor_) { next(); } } IterateIterator(const IterateIterator& other) : container_(other.container_), cursor_(other.cursor_), value_(other) {} auto operator*() const -> const typename Iterate::ElementType& { assert(cursor_); return value_; } auto operator++() -> IterateIterator& { next(); return *this; } auto operator==(const IterateIterator& rhs) -> bool { return container_ == rhs.container_ && cursor_ == rhs.cursor_; } auto operator!=(const IterateIterator& rhs) -> bool { return !(*this == rhs); } private: void next() { assert(cursor_); if (const auto v = container_->Next(*cursor_); v) { value_ = *std::move(v); } else { cursor_ = std::nullopt; } } const Iterate* container_; std::optional cursor_; typename Iterate::ElementType value_; }; // The C++ side of a custom Carbon type implementing // `Iterate` with CursorType `C`, and ElementType `T` // could have `begin()` and `end()` methods defined as: class MyIterateType { // [...] auto begin() -> IterateIterator> { return IterateIterator>(*this, NewCursor()); } auto end() -> IterateIterator> { return IterateIterator>::Invalid(*this); } }; } // namespace Carbon // Usage: Carbon::MyIterateType container; for (auto s : container) { /*...*/ } ``` ### Inversion of control Using an `Optional` has downsides, discussed in the ["alternatives considered"](#alternatives-considered) section. A possible approach would be to invert the control of the `for` loop, by having the container call in the `for` block for each value, passing the value as argument. This would work similarly to [Python generator functions](#python). This has the following advantages: - Removes the need for an `Optional`, and the associated overheads, copies, and unwrapping. - No boundary checks needed at the `for` level - Compatible with R-value containers Limitations: - The context needed by the `for` body will needs to be available or provided - No optimizer known to the team can reliably produce efficient code for this construct. ## Rationale This proposal offers a first step to support a needed way to iterate using a `for` loop. More specifically: - Requires a single interface to be implemented with a combined `Next` method, uses cursors instead of their more complex iterator counterpart, and defines loop values as `let` by default to be less error prone ([Code that is easy to read, understand, and write](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)). - Makes use of [Library APIs only](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/principles/library_apis_only.md). ## Alternatives considered ### Atomic methods for `Iterate` An option was to define methods such as `Get`, `Advance`, `IsAtEnd` on `Iterate`, instead of a unique `Next` methods. While being more atomic, the concern is that these individual methods will all need to be called separately while iterating, while a single call to `Next` performs these three operations together, providing more opportunities for optimization. Using a single operation to return an optional is also consistent with Rust’s approach. The combined `Next` method forces us to return an `Optional`, which has downsides: - potential performance overhead to wrapping and unwrapping, - storage overhead for `Optional` itself, and - may require additional copying of the value wrapped. Those may be mitigated by leveraging unused bit patterns, passing a view of the value, or [inverting control](#inversion-of-control). ### Using an iterator instead of a cursor The iterator approach was considered but proved to have key limitations that a cursor approach does not have: - It requires implementing 2 interfaces instead of 1, and is more complex due to having the iteration logic separated from the container itself - More difficult to harden or troubleshoot, compared to a cursor that allows bounds checking in debug & hardened build modes - Can pose problems with ranges that consumes their input (See Barry Revzin’s "take(5)" presentation from C++Now 2023), when compared to a combined `Next()` approach. - Higher overhead - Proves to be difficult to support for R-values On the other hand, the cursor approach deviates more from C++ than iterator, and may require some more effort for C++ interoperability. ### Support getter for both `T` and `T*` with `Iterate` Because some collections will be read-only, we want to separate mutable and immutable interfaces. A single interface would force developers to implement or stub the mutable portion of the interface, even if it is not applicable.