# Lower ## Table of contents - [Overview](#overview) - [Generic lowering](#generic-lowering) - [Cross-file lowering](#cross-file-lowering) - [Specific deduplication and fingerprinting](#specific-deduplication-and-fingerprinting) - [Mangling](#mangling) - [Examples](#examples) ## Overview Lowering takes the SemIR and produces LLVM IR. At present, this is done in a single pass, although it's possible we may need to do a second pass so that we can first generate type information for function arguments. The lowering context is split into three layers: - The `Context` object holds state for an overall lowering process that produces a single LLVM module. - The `FileContext` object holds state for lowering from a particular `SemIR::File`, and holds a pointer to its enclosing `Context`. Multiple files may be involved in a single lowering process when lowering a generic, where the definition of the generic and the specific may be owned by distinct files. This setup would also allow us to lower an entire library into a single LLVM module if we chose to do so. - The `FunctionContext` object holds state for lowering a particular function, including an `IRBuilder` and mappings from the local `InstId`s to their lowered `llvm::Value*`s and from the local `InstBlockId`s to their lowered `llvm::BasicBlock*`s. Lowering is done per `SemIR::InstBlock`. This minimizes changes to the `IRBuilder` insertion point, something that is both expensive and potentially fragile. ## Generic lowering In order to support lowering generic functions, the `FunctionContext` tracks both the `FunctionId` of the function being lowered and a corresponding `SpecificId`. Whenever `FunctionContext` or a `HandleInst` function inspects a property of an instruction that can vary between specifics -- in particular, the type or constant value of an instruction -- that value is looked up in the current specific, and the corresponding type or value is used instead. `FunctionContext::GetTypeOfInst` and `FunctionContext::GetTypeIdOfInst` do this mapping for the type of an instruction, and should be used instead of directly looking at the `type_id` field of a typed instruction throughout function lowering. Similarly, `FunctionContext::GetValue` does this mapping when looking up the constant value of an instruction. ## Cross-file lowering `FunctionContext` lowering may draw information used to lower the function from two different files: - The file in which the function was defined. - For a generic function, the file in which the specific was formed. Each of these files has its own `FileContext`, which tracks its corresponding `SemIR::File`, as well as mappings from its constant values to `llvm::Constant*`s and mappings from its functions to `llvm::Function*`s, and so on. When querying the type of an instruction using `FunctionContext::GetTypeIdOfInst`, the resulting type may be owned by either of these files. The type is represented as a `TypeInFile`, which is a pair of the owning `SemIR::File*` and the `SemIR::TypeId` within that file. Care must be taken to only pass the `TypeId` in a `TypeInFile` to code that expects a `TypeId` within the corresponding `SemIR::File*`. To reduce the risk of errors, code within `FunctionContext` and `HandleInst` functions should not directly interact with `TypeId`s, and should instead always use `TypeInFile`. Similarly, `SemIR::ValueRepr` has a `FunctionContext::ValueReprInFile` wrapper that tracks the file that owns its `TypeId`, and `SemIR::InstId` has a `FunctionContext::InstInFile` wrapper that tracks the file that owns the `InstId`. These wrappers are kept intact wherever possible, in order to minimize the chance of an ID being used with the wrong file. ## Specific deduplication and fingerprinting Specifics for the same generic are deduplicated by detecting whether we generated the same LLVM IR for all the portions of the specific that depend on generic arguments. This is accomplished in part by computing a fingerprint for each specific. The fingerprint contains: - For each symbolic constant value used while lowering, the lowered LLVM value in the specific. - For each symbolic type used while lowering, the lowered LLVM type in the specific. - For each called function, information about the specific callee. TODO: Describe how we handle deduplicating strongly-connected components of the call graph. - For each other property of the specific that lowering depends on, the value of that property. These fingerprinted values are tracked by the `FunctionContext` accessors that obtain the information from SemIR: - `FunctionContext::GetType` adds the `llvm::Type*` produced for a symbolic type to the fingerprint. - `FunctionContext::GetValue` adds the `llvm::Value*` produced for a symbolic constant to the fingerprint. - `FunctionContext::GetValueRepr` adds the kind of the value representation, but not the value representation type, to the fingerprint. - `FunctionContext::GetInitRepr` adds the kind of the initializing representation to the fingerprint. - `FunctionContext::GetReturnTypeInfo` adds the kind of the return representation, but not the type, to the fingerprint. For `GetValueRepr` and `GetReturnTypeInfo`, the corresponding type is represented as a `TypeInFile`. The convention in use is that `TypeInFile` values represent types that have not yet been added to the fingerprint for the specific, and the mapping from `TypeInFile` to `llvm::Type*` is the point where the type is added to the fingerprint, but other data such as the enumeration values stored on `ReturnTypeInfoInFile` have already been added to the fingerprint. Additional information queried from SemIR by `FunctionContext` or a `HandleInst` function should follow the same pattern, adding a getter on `FunctionContext` that adds the information to the fingerprint, and returns a `*InFile` wrapper struct if the result contains any `TypeId`s. Additional details can be found in: [Coalescing generic functions emitted when lowering to LLVM IR](coalesce_generic_lowering.md). ## Mangling Part of lowering is choosing deterministically unique identifiers for each lowered entity to use in platform object files. Any feature of an entity (such as parent namespaces or overloaded function parameters) that would create a distinct entity must be included in some way in the generated identifier. The current rudimentary name mangling scheme is as follows: - As a special case, `Main.Run` is emitted as `main`. Otherwise the resulting name consists of: 1. `_C` 2. The unqualified function name (function name mangling is the only thing implemented at the moment). 3. If the function is a thunk, `:thunk` to distinguish it from the function it invokes. 4. `.` 5. If the function being mangled is a member of: - an `impl`, then add: 1. The implementing type, per the scope mangling. 2. `:` 3. The interface type, per the scope mangling. - a type or namespace, then add: 1. The scope, per the scope mangling. The scope mangling scheme is as follows: 1. The unqualified name of the type or namespace. 2. If the type or namespace is within another type or namespace: 1. `.` 2. The enclosing scope, per the scope mangling. 3. `.` 4. The package name. ### Examples ```carbon package P1; interface Interface { fn Op[self: Self](); } ``` ```carbon namespace NameSpace; class NameSpace.Implementation { // Mangled as: // `_COp.Implementation.NameSpace.Main:Interface.P1` impl as P1.Interface { fn Op[self: Self]() { } } } // Mangled as `main`. fn Run() { var v: NameSpace.Implementation; v.(P1.Interface.Op)(); } ```