From source code:
- Lexer
- Syntactic analyzer
- Semantic analyzer
- Optimizer*
- Code generator
The result is an executable/image.
- Performs lexical analysis
- "Knows" the valid keywords, literal formats, and identifier formats
- Does not perform any structural analysis
- Outputs a list of tokens from your source code
- Tools
- c: lex, flex
- java: javacc
Code
int main() {
long x = 5 + 6;
printf("%d", x);
return 0;
}
Token list
int, main, (, ), {, long, x, =, 5, +, 6, ;, printf, ...
- grouping symbols (;, {}, (), :, +...)
- operators
- identifiers
- keywords
- literals
- comments
- Performs syntax analysis
- Checks the token list for valid structure
- Knows the grammar of the language
- Takes a token list and outputs a syntax tree
- Tools
- c: yacc, bison
- java: javacc
Syntax tree
int main
/ | \
= printf return
/ \ / \ \
long x + "%d" x x
/ \
5 6
- Takes a syntax tree and table and outputs an operations list and symbol table
- Evaluate the parse tree to create a list of instructions that represent the program
- Creates a list of identifiers and associated information (symbol table)
- Type mismatches are caught as the symbol table is created
Operation list
+: 5, 6
=: x
...
return: x
Symbol table
main: function (args, return type...)
x: value long
printf: function
- Takes an operation list and symbol table and outputs an optimized operation list and symbol table
- Translates the operation list and symbol table into binary assembly code
- Creates the actual machine instructions