Tree: 6b2d55d289

name: Toolchain development description:

Instructions for checking, building, debugging, and understanding the Carbon
toolchain.

Toolchain development

Toolchain structure

Under toolchain/:
- base/: Base infrastructure and common utilities.
- check/: Semantic analysis (SemIR generation).
- lex/: Lexing (Source -> Tokens).
- lower/: Lowering to LLVM IR.
- parse/: Parsing (Token -> Parse Tree).
- sem_ir/: Semantic Intermediate Representation (SemIR) definitions.

Toolchain architecture

Documentation: Refer to toolchain/docs for detailed architecture design and patterns.
- Refer to Toolchain Idioms for a comprehensive list of patterns (for example, ValueStore, formatting .def files, struct reflection) used throughout the implementation.
Phases: Lex -> Parse -> Check -> Lower.
Definitions: Many kinds (tokens, parse nodes, SemIR instructions) are defined in .def files and expanded by way of macros.
Handlers:
- Parser: Handle<StateName> in parse/handle_*.cpp.
- Checker: HandleParseNode in check/handle_*.cpp.
- Lowering: HandleInst in lower/handle_*.cpp.
Iteration: Prefer iterative algorithms over recursive ones to prevent stack exhaustion on complex codebases.

Essential commands

Test everything: bazelisk test //...
Test specific target: bazelisk test //toolchain/testing:file_test
Test specific file: bazelisk test //toolchain/testing:file_test --test_arg=--file_tests=<path_to_carbon_file>
Build toolchain: bazelisk build //toolchain/...

Updating test data

Carbon tests often use file_test (for example, //toolchain/testing/file_test). If you change compiler behavior, you likely need to update expected test outputs. Do not manually edit thousands of lines of expected output. Use the script:

./toolchain/autoupdate_testdata.py
# Or for a specific file:
./toolchain/autoupdate_testdata.py toolchain/check/testdata/my_test.carbon

Debugging and diagnostics

Printing to stderr: Use llvm::errs() << "debug info\n";.
- Avoid std::cout (it may interfere with tool output).
SemIR Stringification:
- SemIR objects often have a Print method or operator<<.
- inst.Print(llvm::errs())
Debugging Crashes:
- Bazel sandboxing can hide artifacts. Use --sandbox_debug if needed, but often running the binary directly from bazel-bin/ is easier for debugging.

Error handling

No exceptions: Do not use C++ exceptions.
ErrorOr<T>: Return ErrorOr<T> for fallible operations.
- Check with if (auto result = Function(); result) { Use(*result); }
llvm::Expected<T>: Similar to ErrorOr, used when interfacing with LLVM.

Casting (LLVM style)

Use llvm::cast<T>(obj) (checked, asserts on failure).
Use llvm::dyn_cast<T>(obj) (returns null on failure).
Use llvm::isa<T>(obj) (boolean check).
Avoid dynamic_cast and standard RTTI.

Data structures

Prefer APIs in common/ and toolchain/base/ over LLVM ADTs. For example, use Map instead of llvm::DenseMap.
If no Carbon API exists, prefer LLVM ADTs over standard library ones (for example llvm::SmallVector, llvm::StringRef).
StringRef is a view; be careful with lifetimes.

Common pitfalls

Legacy explorer references: The explorer prototype has been moved. Ignore references to it in proposals or old docs; focus on toolchain.
Manually updating test files: Always check if autoupdate_testdata.py can do it for you.
Using std::string unnecessarily: Prefer llvm::StringRef for arguments.
Header includes: Use specific include orders (often enforced by clang-format).
Parse node order: Semantics processes parse nodes in post-order; ensure your parser transitions support this.

SKILL.md 4.3 KB History Raw