Skip to content

Commit

Permalink
Added info on symbolic tokens in design docs (#2657)
Browse files Browse the repository at this point in the history
The [Operators proposal #601](#601) got accepted and but the details were not updated in the design docs. Added `symbolic_tokens.md` file to add the details of the proposal and its discussion.

Closes #1992 

Co-authored-by: Avi Aaron <[email protected]>
  • Loading branch information
aswin2108 and aviRon012 authored Jun 2, 2023
1 parent b59b3fb commit 6d399c8
Show file tree
Hide file tree
Showing 2 changed files with 104 additions and 1 deletion.
4 changes: 3 additions & 1 deletion docs/design/lexical_conventions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,12 @@ A _lexical element_ is one of the following:
- a maximal sequence of [whitespace](whitespace.md) characters
- a [word](words.md)
- a literal:

- a [numeric literal](numeric_literals.md)
- a [string literal](string_literals.md)

- a [comment](comments.md)
- TODO: operators ...
- a [symbolic token](symbolic_tokens.md)

The sequence of lexical elements is formed by repeatedly removing the longest
initial sequence of characters that forms a valid lexical element.
101 changes: 101 additions & 0 deletions docs/design/lexical_conventions/symbolic_tokens.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Symbolic Tokens

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

<!-- toc -->

## Table of contents

- [Overview](#overview)
- [Details](#details)
- [Symbolic token list](#symbolic-token-list)
- [Alternatives considered](#alternatives-considered)
- [References](#references)

<!-- tocstop -->

## Overview

A _symbolic token_ is one of a fixed set of
[tokens](https://en.wikipedia.org/wiki/Lexical_analysis#Token) that consist of
characters that are not valid in identifiers. That is, they are tokens
consisting of symbols, not letters or numbers. Operators are one use of symbolic
tokens, but they are also used in patterns `:`, declarations (`->` to indicate
return type, `,` to separate parameters), statements (`;`, `=`, and so on), and
other places (`,` to separate function call arguments).

Carbon has a fixed set of symbolic tokens, defined by the language
specification. Developers cannot define new symbolic tokens in their own code.

Symbolic tokens are lexed using a "max munch" rule: at each lexing step, the
longest symbolic token defined by the language specification that appears
starting at the current input position is lexed, if any.

When a symbolic token is used as an operator, the surrounding whitespace must
follow certain rules:

- There can be no whitespace between a unary operator and its operand.
- The whitespace around a binary operator must be consistent: either there is
whitespace on both sides or on neither side.
- If there is whitespace on neither side of a binary operator, the token
before the operator must be an identifier, a literal, or any kind of closing
bracket (for example, `)`, `]`, or `}`), and the token after the operator
must be an identifier, a literal, or any kind of opening bracket (for
example, `(`, `[`, or `{`).

These rules enable us to use a token like `*` as a prefix, infix, and postfix
operator, without creating ambiguity.

## Details

### Symbolic token list

The following is the initial list of symbolic tokens recognized in a Carbon
source file:

| Symbolic Tokens | Explanation |
| --------------- | ------------------------------------------------------------------------------------------------------------ |
| `+` | Addition |
| `-` | Subtraction and negation |
| `*` | Indirection, multiplication, and forming pointer types |
| `/` | Division |
| `%` | Modulus |
| `=` | Assignment |
| `^` | Complementing and Bitwise XOR |
| `&` | Address-of and Bitwise AND |
| `\|` | Bitwise OR |
| `<<` | Arithmetic and Logical Left-shift |
| `>>` | Arithmetic and Logical Right-shift |
| `==` | Equality or equal to |
| `!=` | Inequality or not equal to |
| `>` | Greater than |
| `>=` | Greater than or equal to |
| `<` | Less than |
| `<=` | Less than or equal to |
| `->` | Return type and indirect member access |
| `=>` | Match syntax |
| `[` and `]` | Subscript and deduced parameter lists |
| `(` and `)` | Function call, function declaration, and tuple literals |
| `{` and `}` | Struct literals, blocks of control flow statements, and the bodies of definitions (classes, functions, etc.) |
| `,` | Separate tuple and struct elements |
| `.` | Member access |
| `:` | Name bindings |
| `:!` | Generic binding |
| `;` | Statement separator |

TODO: The assignment operators in
[#2511](https://github.com/carbon-language/carbon-lang/pull/2511) are still to
be added.

## Alternatives considered

- [Proposal: p0601](/proposals/p0601.md#alternatives-considered)

## References

- Proposal
[#601: Symbolic tokens](https://github.com/carbon-language/carbon-lang/pull/601)

0 comments on commit 6d399c8

Please sign in to comment.