# Coalescing generic functions emitted when lowering to LLVM IR ## Table of contents - [Overview](#overview) - [Design details](#design-details) - [SemIR representation and why to coalesce during lowering](#semir-representation-and-why-to-coalesce-during-lowering) - [Recursion and strongly connected components (SCCs)](#recursion-and-strongly-connected-components-sccs) - [Function fingerprints](#function-fingerprints) - [Canonical specific to use](#canonical-specific-to-use) - [Algorithm details](#algorithm-details) - [Opportunities for further improvement](#opportunities-for-further-improvement) - [Alternatives considered](#alternatives-considered) ## Overview When lowering Carbon generics to LLVM, it is possible we emit duplicate LLVM IR functions. This document describes the algorithm implemented in [lowering](lower.md) for determining when and which generated specifics, while different at the Carbon language level, can be coalesced into a single one when lowering Carbon’s intermediate representation (_SemIR_), to [LLVM IR](https://llvm.org/docs/LangRef.html). The overall goal of this optimization is to avoid generating duplicate LLVM IR code where it is easy to determine this from the front-end. Such an optimization needs to be done after specialization, but there is some flexibility in when to do it afterwards: before lowering, through analysis of SemIR or during/after lowering. The goal of this doc is to describe the algorithm implemented in [specifics_coalescer](/toolchain/lower/specific_coalescer.h), from putting it into context, to the overall goal, the challenges and where there is still room for improvement in subsequent iterations. Determining the impact on compile-time is beyond the scope of this document, but an important problem to follow up on. ## Design details In order to determine if two specific functions are equivalent, and a single one of them can be used instead of the other, the following need to be considered as part of the algorithm and its implementation. ### SemIR representation and why to coalesce during lowering In SemIR, a specific function is defined by an unique tuple: `(function_id, specific_id)`. There is a single in-memory representation of a generic function’s body (not one for each specific), where the instructions that are different between specifics can be determined, on-demand, based on a given `specific_id`. Hence, determining if two specifics are equivalent needs to analyze if these specific-dependent instructions are equivalent at the LLVM IR level. This can only be determined after the eval phase is complete and using information on how Carbon types map to `llvm::Type`s. The algorithm described below does coalescing of specifics during lowering. Also see [alternatives considered](#alternatives-considered). ### Recursion and strongly connected components (SCCs) Comparing if two different specific functions contain (access, invoke, etc.) the same specific-dependent instruction is not straightforward when recursion is involved. The simplest example is when A and B each are recursive functions, and are equivalent. The check "are A and B equivalent" needs to start by assuming they are equivalent, and when a self-recursive call is found in each, that call is still equivalent. In practice this requires comparison of `specific_id`s, which in SemIR are distinct. In the general case, this analysis needs to analyze the call graph for all functions and build strongly connected components (SCCs). The call graph could either be created before lowering or built while lowering. The current implementation does the latter, and in a post-processing phase we can conclude equivalence and simplify the emitted LLVM IR by deleting unnecessary parts. A non-viable option is building the call graph based on the information "what are all call sites of myself, where I am a specific function", because this information is not available until processing the function bodies of all specific functions. This is an optimization done so that the definition of a specific isn’t emitted until a use of it is found. Building that information would duplicate all the lowering logic, minus the LLVM IR creation. ### Function fingerprints Even with limiting the comparison of specific functions to those defined from the same generic, a comparison algorithm would still end up with quadratic complexity in the number of specifics for that generic. We define two fingerprints for each specific: 1. `specific_fingerprint`: Includes all specific-dependent information. 2. `common_fingerprint`: Includes the same except for `specific_id` information, as `specific_id`s can only be determined to be equivalent after building an equivalence SCC. Two specific functions are equivalent if their `specific_fingerprint`s are equal and are not equivalent if their `common_fingerprint`s differs. If the `common_fingerprint`s are equal but the `specific_fingerprint`s are not, the two functions may still be equivalent. Ideally, the `specific_fingerprint` can be used as a unique hash to first coalesce all specific functions with this same fingerprint, with no additional checks. Then, all remaining functions may use the `common_fingerprint` as another unique hash to group remaining potential candidates for coalescing. Then, only those with this same `common_fingerprint` are processed in a quadratic pass walking all calls instructions and comparing if the `specific_id` information is equivalent. These optimizations are not currently implemented. Note that this does not [coalesce non-specifics](/proposals/p6716.md#coalescing-duplicate-non-specific-functions). ### Canonical specific to use For determining the canonical specific to use, we use a [disjoint set](https://en.wikipedia.org/wiki/Disjoint-set_data_structure). ## Algorithm details Below is a pseudocode of the existing algorithm in `toolchain/lower/specific_coalescer.*`. The implementation can be found in [specifics_coalescer.h](/toolchain/lower/specific_coalescer.h) and [specifics_coalescer.cpp](/toolchain/lower/specific_coalescer.cpp). At the top level, the current algorithm first generates all function definitions, and once this is complete, it performs the logic to coalesce specifics and delete the redundant LLVM function definitions. ```none LowerToLLVM () { for all non_generic_functions CreateLLVMFunctionDefinition (function, no_specific_id); PerformCoalescingPostProcessing (); } ``` The lowering starts with all non-generic functions. While lowering these, when calls to specifics are encountered, it also generates definitions for those specific functions. For each lowered specific function definition, we create the `SpecificFunctionFingerprint`, which includes the [two fingerprints](#function-fingerprints), and a list of calls to other specific functions. ```none CreateLLVMFunctionDefinition (function, specific_id) { For each SemIR instruction in the function: Step 1: Emit LLVM IR for the instruction Step 2: If the instruction is specific-dependent, hash it and add to its `common_fingerprint` Step 3: If the SemIR instruction is a call to a specific, a) Create a definition for this specific_id if it doesn't exist: CreateLLVMFunctionDefinition (function, specific_id); b) Hash the specific_id to the current function's `specific_fingerprint` c) Add the non-hashed specific_id to list of calls performed } ``` The logic that performs the actual coalescing analyzes all specifics. For each pair of two specifics, it first checks if the LLVM function types match (using a third hash-like fingerprint: `function_type_fingerprint` for storage optimization), then if these are equivalent based on the `SpecificFunctionFingerprint`. For each pair of equivalent functions found (in a callgraph SCC), one function will be marked non-canonical: its uses are replaced with the canonical one and its definition will ultimately be deleted. ```none PerformCoalescingPostProcessing () { for each two specifics of the same generic { if function_type_fingerprints differ { track as non-equivalent continue } add the two specifics to assumed equivalent specifics list if (CheckIfEquivalent(two specifics, assumed equivalent specifics list)) { for each two equivalent specifics found { find the canonical specific & mark the duplicates for replacement/deletion } } replace all duplicate specifics with the respective canonical specifics and delete all replaced LLVM function definitions. } ``` The equivalence check for specifics based on the constructed `SpecificFunctionFingerprint` can make an early non-equivalence determination based on the `common_fingerprint`s, and an early equivalence determination based on the `specific_fingerprint`s. Otherwise, it uses the call list and recurses to make the determination for all functions in the SCC call graph (in practice the implementation uses a worklist to avoid the recursion). ```none CheckIfEquivalent(two specifics, &assumed equivalent specifics) -> bool { if common_fingerprints are non-equal { track as non-equivalent specifics return false } if specific_fingerprints are equal { track as equivalent specifics return true } if already tracked as equivalent or assumed equivalent specifics { return true } for each of the calls in each of the specifics { if the functions called are the same or already equivalent or assumed equivalent specifics { continue } if the functions called are already non-equivalent specifics { return false } add to assumed equivalent specifics if !CheckIfEquivalent(specifics in , assumed equivalent specifics) { return false; } } } ``` ## Opportunities for further improvement The current implemented algorithm can be improved with at least the following: - The `specific_fingerprint` can be used to already bucket specifics that can be coalesced right away. - The remaining ones can be pre-bucketed such that only the specifics with the same `common_fingerprint` have their list of calls further compared (linear in the number of specific calls inside the functions) to determine SCCs that may be equivalent. This should reduce the complexity from the current O(N^2), with N=number of specifics for a generic, to O(M^2), with M being the number of specifics for a generic that have different `specific_fingerprint` and equal `common_fingerprint` (expectation is that M << N). An additional potential improvement is defining the function fingerprints in a manner that is translation-unit independent, so this can be used in the mangled name, and the same function name emitted. This does not currently occur, as the two fingerprints use internal SemIR identifiers (`function_id` and `specific_id` respectively). ## Alternatives considered - [Coalescing in the front-end vs back-end?](/proposals/p6716.md#coalescing-in-the-front-end-vs-back-end) - [When to do coalescing in the front-end?](/proposals/p6716.md#when-to-do-coalescing-in-the-front-end) - [Compile-time trade-offs](/proposals/p6716.md#compile-time-trade-offs) - [Coalescing duplicate non-specific functions](/proposals/p6716.md#coalescing-duplicate-non-specific-functions)