# Basic Syntax [Pull request](https://github.com/carbon-language/carbon-lang/pull/0162) ## Table of contents ## Table of contents - [Problem](#problem) - [Background](#background) - [Proposal](#proposal) - [Details](#details) - [Expressions and Patterns](#expressions-and-patterns) - [Statements](#statements) - [Declarations](#declarations) - [Precedence and Associativity](#precedence-and-associativity) - [Abstract Syntax](#abstract-syntax) - [Abstract Syntax for Expressions](#abstract-syntax-for-expressions) - [Abstract Syntax for Statements](#abstract-syntax-for-statements) - [Abstract Syntax for Declarations](#abstract-syntax-for-declarations) - [Alternatives considered](#alternatives-considered) - [Rationale](#rationale) ## Problem The purpose of this proposal is to establish some basic syntactic elements of the Carbon language and make sure that the grammar is unambiguous and can be parsed by an LALR parser such as `yacc` or `bison`. The grammar presented here has indeed been checked by `bison`. The language features in this basic grammar include control flow by way of `if` and `while`, functions, simple structures, choice, pattern matching, and a sample of operators on Booleans and integers. The main syntactic categories are `declaration`, `statement`, and `expression`. Establishing these syntactic categories should help the other proposals choose syntax that is compatible with the rest of the language. ## Background The grammar proposed here is based on the following proposals: - [Carbon language overview](https://github.com/carbon-language/carbon-lang/tree/trunk/docs/design) - pattern matching [#87](https://github.com/carbon-language/carbon-lang/pull/87), - structs [#98](https://github.com/carbon-language/carbon-lang/pull/98), - tuples [#111](https://github.com/carbon-language/carbon-lang/pull/111), - sum types [#157](https://github.com/carbon-language/carbon-lang/pull/157), and - metaprogramming [#89](https://github.com/carbon-language/carbon-lang/pull/89). ## Proposal We summarize the four main syntactic categories here and define the grammar in the next section. - `declaration` includes function, structure, and choice definitions. - `statement` includes local variable definitions, assignment, blocks, `if`, `while`, `match`, `break`, `continue`, and `return`. - `expression` includes function calls, arithmetic, literals, and other syntaxes that evaluate to a value. In Carbon, a type is a kind of value, and Carbon makes no syntactic distinction between type-valued expressions and other kinds of expressions, so the `expression` nonterminal is used in positions where a type is expected. - `pattern` for the patterns in a `match` statement, for the left-hand side of a variable definition, and for describing the parameters of a function. The grammar treats patterns and expressions identically, but the type system will only allow pattern variables (`expression ':' identifier`) in patterns and not in value or type-producing expressions. Having the separate non-terminals for `pattern` and `expression` helps to document this distinction. The proposal also specifies the abstract syntax. When this proposal is accepted, the accompanying `flex`, `bison`, and C++ files that define the grammar and implement the parser actions will be placed in the `executable_semantics` directory of the `carbon-lang` repository. In the near future there will be proposals regarding the semantic analysis (aka. type checking) and specification of runtime behavior for this subset of Carbon with the plan to add the accompanying executable forms of those specifications to the `executable_semantics` directory. Looking towards growing the Carbon language, the intent is for other proposals to add to this grammar by modifying the `flex` and `bison` files and the abstract syntax definitions in the `executable_semantics` directory. ## Details ### Expressions and Patterns The following grammar defines the concrete syntax for expressions. Below we comment on a few aspects of the grammar. ```Bison pattern: expression ; expression: identifier | expression designator | expression '[' expression ']' | expression ':' identifier | integer_literal | "true" | "false" | tuple | expression "==" expression | expression '+' expression | expression '-' expression | expression "and" expression | expression "or" expression | "not" expression | '-' expression | expression tuple | "auto" | "fnty" tuple return_type ; designator: '.' identifier ; tuple: '(' field_list ')' ; field_list: /* empty */ | field | field ',' field_list ; field: pattern | designator '=' pattern ; return_type: /* empty */ | "->" expression ; ``` The grammar rule ```Bison expression: expression ':' identifier ``` is for pattern variables. For example, in a variable definition such as var Int: x = 0; the `Int: x` is parsed with the grammar rule for pattern variables. In the right-hand side of the above grammar rule, the `expression` to the left of the `:` must evaluate to a type at compile time. The grammar rule ```Bison expression: "fnty" tuple return_type ``` is for function types. They are meant to play the role that function pointers play in C and that `std::function` plays in C++. Regarding the grammar rule for the return type of a function: ```Bison return_type: "->" expression ``` the `expression` is expected to evaluate to a type at compile-time. The grammar rule ```Bison tuple: '(' field_list ')' ``` is primarily for constructing a tuple, but it is also used for creating tuple types and tuple patterns, depending on the context in which the expression occurs. Regarding the grammar rules for `designator` and for a named argument ```Bison designator: '.' identifier field: designator '=' pattern ``` The period enables the addition of new keywords to Carbon without colliding with field names. The period also aids integrated development environments with auto-completion. The issue is that without the period, when the programmer is typing the identifier (and not yet typed the `=`), the IDE doesn't know whether the identifier is for a positional argument, in which case it is a variable occurrence, or whether the identifier is a field name. ### Statements The following grammar defines the concrete syntax for statements. ```Bison statement: "var" pattern '=' expression ';' | expression '=' expression ';' | expression ';' | "if" '(' expression ')' statement "else" statement | "while" '(' expression ')' statement | "break" ';' | "continue" ';' | "return" expression ';' | '{' statement_list '}' | "match" '(' expression ')' '{' clause_list '}' ; statement_list: /* empty */ | statement statement_list ; clause_list: /* empty */ | clause clause_list ; clause: "case" pattern "=>" statement | "default" "=>" statement ; ``` ### Declarations The following grammar defines the concrete syntax for declarations. ```Bison declaration: "fn" identifier tuple return_type '{' statement_list '}' | "fn" identifier tuple "=>" expression ';' | "fn" identifier tuple return_type ';' | "struct" identifier '{' member_list '}' | "choice" identifier '{' alternative_list '}' ; member: "var" expression ':' identifier ';' ; member_list: /* empty */ | member member_list ; alternative: identifier tuple ';' ; alternative_list: /* empty */ | alternative alternative_list ; declaration_list: /* empty */ | declaration declaration_list ; ``` In the grammar rule for function definitions ```Bison declaration: "fn" identifier tuple return_type '{' statement_list '}' ``` the `tuple` is used as a pattern to describe the parameters of the function. The grammar for function definitions does not currently include [implicit parameters](/docs/design/generics/terminology.md#deduced-parameter), but the intent is to add them to the grammar in the future. In the rule for field declarations ```Bison member: "var" expression ':' identifier ';' ``` the `expression` must evaluate to a type at compile time. The same is true for the `tuple` in the grammar rule for an alternative: alternative: identifier tuple ';' ### Precedence and Associativity The following precedence and associativity specification is meant to approximate the definitions in the operators and precedence proposal [#168](https://github.com/carbon-language/carbon-lang/pull/168) to the extent that is possible in a `yacc`/`bison` generated parser. The ordering is from lowest to highest precedence, with operators on the same line having equal precedence. Proposal 168 differs in that the operator groups are partially ordered instead of being totally ordered. nonassoc '{' '}' nonassoc ':' ',' left "or" "and" nonassoc "==" "not" left '+' '-' left '.' "->" nonassoc '(' ')' '[' ']' ### Abstract Syntax The output of parsing is an abstract syntax tree. There are many ways to define abstract syntax. Here we simply use C-style `struct` definitions. The definition of the abstract syntax is important because it is used in the specification of the semantic analysis and the specification of runtime behavior. #### Abstract Syntax for Expressions ```C++ enum ExpressionKind { Variable, PatternVariable, Int, Bool, PrimitiveOp, Call, Tuple, Index, GetField, IntT, BoolT, TypeT, FunctionT, AutoT }; enum Operator { Neg, Add, Sub, Not, And, Or, Eq }; struct Expression { ExpressionKind tag; union { struct { string* name; } variable; struct { Expression* aggregate; string* field; } get_field; struct { Expression* aggregate; Expression* offset; } index; struct { string* name; Expression* type; } pattern_variable; int integer; bool boolean; struct { vector >* fields; } tuple; struct { Operator operator_; vector* arguments; } primitive_op; struct { Expression* function; Expression* argument; } call; struct { Expression* parameter; Expression* return_type;} function_type; } u; }; ``` The correspondence between most of the grammar rules and the abstract syntax is straightforward. However, the parsing of the `field_list` deserves some explanation. The fields can be labeled with the grammar rule: ```Bison field: designator '=' pattern ``` or unlabeled, with the rule ```Bison field: pattern ``` The unlabeled fields are given numbers (represented as strings) for field labels, starting with 0 and going up from left to right. Regarding the rule for tuples: ```Bison tuple: '(' field_list ')' ``` if the field list only has a single unlabeled item without a trailing comma, then the parse result for that item is returned. Otherwise a `tuple` AST node is created containing the parse results for the fields. In the following grammar rules, if the parse result for `tuple` is not a tuple (because the field list was a single unlabeled item without a trailing comma), then a tuple is wrapped around the expression to ensure that it is a tuple. ```Bison expression: expression tuple expression: "fnty" tuple return_type function_definition: "fn" identifier tuple return_type '{' statement_list '}' | "fn" identifier tuple DBLARROW expression ';' | "fn" identifier tuple return_type ';' alternative: identifier tuple ';' ``` #### Abstract Syntax for Statements ```C++ enum StatementKind { ExpressionStatement, Assign, VariableDefinition, If, Return, Sequence, Block, While, Break, Continue, Match }; struct Statement { StatementKind tag; union { Expression* exp; struct { Expression* lhs; Expression* rhs; } assign; struct { Expression* pat; Expression* init; } variable_definition; struct { Expression* cond; Statement* then_; Statement* else_; } if_stmt; Expression* return_stmt; struct { Statement* stmt; Statement* next; } sequence; struct { Statement* stmt; } block; struct { Expression* cond; Statement* body; } while_stmt; struct { Expression* exp; list< pair >* clauses; } match_stmt; } u; }; ``` #### Abstract Syntax for Declarations ```C++ struct FunctionDefinition { string name; Expression* param_pattern; Expression* return_type; Statement* body; }; enum MemberKind { FieldMember }; struct Member { MemberKind tag; union { struct { string* name; Expression* type; } field; } u; }; struct StructDefinition { string* name; list* members; }; enum DeclarationKind { FunctionDeclaration, StructDeclaration, ChoiceDeclaration }; struct Declaration { DeclarationKind tag; union { struct FunctionDefinition* fun_def; struct StructDefinition* struct_def; struct { string* name; list >* alts; } choice_def; } u; }; ``` ## Alternatives considered Expressions, type expressions, and patterns could instead be defined using completely disjoint grammar productions, but that would require adding more keywords or symbols to avoid ambiguity and there would be a fair amount of repetition in grammar productions. The proposal does not include declarations for uninitialized variables, leaving that to a later proposal. In this proposal, assignment is a statement. It could instead be an expression as it is in C and C++. The arguments against assignment-as-an-expression include 1) it complicates reasoning about the ordering of side-effects and 2) it can cause confusion between `=` and `==` [(SEI CERT C Coding Standard)](https://wiki.sei.cmu.edu/confluence/display/c/EXP45-C.+Do+not+perform+assignments+in+selection+statements) [Visual Studio Warning](https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4706?view=vs-2019). The syntax for variable initialization uses `=`, which is the same token used in assignment. An alternative would be to use a different syntax such as `:=` to better signal the semantic differences between initialization and assignment. The author does not have a preference on this choice. The `:=` alternative does not introduce any complications regarding ambiguity despite the other uses of `:` in the grammar. Inside a `choice`, an `alternative` could instead begin with a keyword such as `alt`, `fn`, or `var`, as discussed in the proposal for sum types [#157](https://github.com/carbon-language/carbon-lang/pull/157). The author does not have a preference in this regard. The precedence for `and` and `or` are equal, following the suggestion of proposal [#168](https://github.com/carbon-language/carbon-lang/pull/168), whereas in C++ `&&` has higher precedence than `||`. The parameters of a function are described using the syntax of a pattern. There could instead be separate syntax that is special to function parameters, as there is in C++. There are open questions regarding the design of function types, so we could alternatively postpone the specification of the syntax for function types. The choice to include them is based on the idea that Carbon will need something that fills the roles of `std::function` and C-style function pointers. Also, the features were selected for the basic syntax proposal in a way that aimed to make the proposal complete in the sense that each value-oriented feature (functions, tuples, structures) comes with syntax for 1) creating values, 2) using values, and 3) a type for classifying values. The parameters of a function type are described using the syntax for type expression. There could instead be separate syntax that is special to the parameters of a function type, as there is in C++. The keyword for introducing a function type is `fnty`, which is an arbitrary choice. Other alternatives include `fn_type` and `fn`. The `fn` alternative would conflict with future plans to use `fn` for lambda expressions. Some dislike of `fn_type` was voiced, which motivated the choice of `fnty`. It would be nice to do without a keyword, but as the grammar currently stands, that introduce ambiguity. The `pattern` non-terminal is defined in terms of `expression`. It could instead be the other way around, defining `expression` in terms of `pattern`. Regarding tuples and tuple types, this proposal follows [#111](https://github.com/carbon-language/carbon-lang/pull/111) in that there is not a distinct syntax for tuple types. Instead, the intent is that when a tuple of types results from an expression in a context that expects a type, the type system will convert the tuple to a tuple type. An alternative approach has been discussed in which tuple types have a distinct syntax, such as `Tuple(Int, Bool)`. ## Rationale Using code to validate our specification is a really promising direction, and this proposal seems like a good starting point. A reference implementation that's simple enough to be part of the design iteration process should help us move faster, by quickly uncovering the places where our specifications are ambiguous, syntactically or semantically unsound, or don't give the behavior we expect. In other words, it will help us keep ourselves honest, even at the proposal stage, which will help us avoid wasting time and effort implementing designs that turn out to be unworkable. This can be considered as sort of a counterpart to [In-progress design overview #83](p0083.md), in that the design specifics are being approved in order to bootstrap the specification process. We aren't necessarily adopting the specific syntax and semantics expressed by this proposal, and those choices will need to be presented and justified from scratch by future proposals. This decision is deferring the implementation to code review. The specific tooling used to implement the syntax checker, such as Bison, is a detail which may be changed, now or later, without requiring a proposal for core team review.