Skip to content

Commit

Permalink
Specify UTF-8 encoding and tidy up string definition (#247)
Browse files Browse the repository at this point in the history
  • Loading branch information
EvanTheB authored and geoffjentry committed Dec 14, 2018
1 parent df117eb commit fb920ed
Showing 1 changed file with 15 additions and 8 deletions.
23 changes: 15 additions & 8 deletions versions/development/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,27 +234,34 @@ The inputs to this workflow would be `example.files` and `example.hello.pattern`

## Global Grammar Rules

WDL files are encoded in UTF-8, with no BOM.

### Whitespace, Strings, Identifiers, Constants

These are common among many of the following sections

```
$ws = (0x20 | 0x9 | 0xD | 0xA)+
$ws = (0x20 | 0x09 | 0x0D | 0x0A)+
$identifier = [a-zA-Z][a-zA-Z0-9_]+
$string = "([^\\\"\n]|\\[\\"\'nrbtfav\?]|\\[0-7]{1,3}|\\x[0-9a-fA-F]+|\\[uU]([0-9a-fA-F]{4})([0-9a-fA-F]{4})?)*"
$string = '([^\\\'\n]|\\[\\"\'nrbtfav\?]|\\[0-7]{1,3}|\\x[0-9a-fA-F]+|\\[uU]([0-9a-fA-F]{4})([0-9a-fA-F]{4})?)*'
$boolean = 'true' | 'false'
$integer = [1-9][0-9]*|0[xX][0-9a-fA-F]+|0[0-7]*
$float = (([0-9]+)?\.([0-9]+)|[0-9]+\.|[0-9]+)([eE][-+]?[0-9]+)?
```

`$string` can accept the following between single or double-quotes:

* Any character not in set: `\\`, `"` (or `'` for single-quoted string), `\n`
* An escape sequence starting with `\\`, followed by one of the following characters: `\\`, `"`, `'`, `[nrbtfav]`, `?`
* An escape sequence starting with `\\`, followed by 1 to 3 digits of value 0 through 7 inclusive. This specifies an octal escape code.
* An escape sequence starting with `\\x`, followed by hexadecimal characters `0-9a-fA-F`. This specifies a hexadecimal escape code.
* An escape sequence starting with `\\u` or `\\U` followed by either 4 or 8 hexadecimal characters `0-9a-fA-F`. This specifies a unicode code point
* Any character not in set: `\`, `"` (or `'` for single-quoted string), `\n`
* An escape sequence starting with `\`, followed by one of the following characters: `\nt"'`
* An escape sequence starting with `\`, followed by 3 digits of value 0 through 7 inclusive. This specifies an octal escape code.
* An escape sequence starting with `\x`, followed by 2 hexadecimal digits `0-9a-fA-F`. This specifies a hexadecimal escape code.
* An escape sequence starting with `\u` followed by 4 hexadecimal characters or `\U` followed by 8 hexadecimal characters `0-9a-fA-F`. This specifies a unicode code point.

|Escape Sequence|Meaning|\x Equivalent|
|`\\`|`\`|`\x5C`|
|`\n`|newline|`\x0A`|
|`\t`|tab|`\x09`|
|`\'`|single quote|`\x22`|
|`\"`|double quote|`\x27`|

### Types

Expand Down

0 comments on commit fb920ed

Please sign in to comment.