- build register allocation verification
- fix load store bugs
- properly handle parameters with a copy
- switch allocator to use uses data structure for liveness tracking
- in uses liveness calculator, special case phis
- phis should act like copies in source block
- if the value does not live
- it should not be in live out
- it should be in block kill list
- in uses liveness calculator, special case phis
- top to bottom register allocation verification?
- it seems like back-links (loops) are not handled properly in the register allocator yet
- fix value overwrites in the register allocator
- get ssa dump html to have final assembly output
- emit ssa dump html on command line arg
- get a test suite working
- ability to write .asm files
- can invoke customasm to produce .hex files
- can invoke emulator to run the .hex files
- set up a test runner
- start writing some simple test programs inspired by c-test-suite
- storing a const needs a register
store [gp, main__init_guard], 1
- implement function parameters
- add parameters to function entry
- handle register ABI for parameters
- working register allocator
- implement parallel phi moves
- convert boolean checks into proper block types
- try to assign moves the same register so it will be eliminated
- use a better live value scanner
- handles loops properly
- includes branches and other control forks inside the loop
- fixed point iterative analysis (more accurate)
- can emit a .dot file for debugging
- phis and phi copies handled correctly
- makes sure all loop extended vars have liveThroughCalls set properly
- better debugging of liveness with graphviz
- handles loops properly
- reorder PhiCopies so they don't clobber regs
- may need to swap sometimes
- hide debugging spam with flags
- if a function has multiple returns
- create a common exit block
- set successors for each return block to it
- create a phi in the return block for each returned value
- change return to jump, remove controls
- handle stack ABI for parameters
- loading on callee side
- storing on caller side
-
now has ability to compile simple programs that use only word data
- global structs and arrays supported
- allocator does not support spills
- crashes if it needs more registers than available
-
implement multiple function return values
- copies return values to registers a0-a2 in function
- copies extracted values from registers a0-a2 at call site
- test case built
-
storing comparisons in a bool
-
fix bug where return values happen after register restores
-
fix bugs with self referential phis
-
add go source to ssa.html dump
-
add original tools/go/ssa source to ssa.html dump
-
better assembly IR / output
- design a lowest level IR for assembly
- add a module for rj32 specific assembly generation
- generate the assembly IR for rj32
- add a pretty html dump to ssa.html
- add ability to convert assembly IR into text
- convert codegen to only produce the IR
-
better register allocation
- remove register "guessing" -- don't think that's necessary
- improved register colouring debugging info
- add function return values to register verifier
- stronger out-of-ssa handling
- proper CSSA back out
- additional parallel moves after the phi
- remove assumption that PhiCopies are only at the end of a block
- handle transition from Phi to PhiCopy block properly
- proper CSSA back out
-
[.] start a standard library
- builtin print/println support
- compiles to a series of runtime functions
- implement some
runtime
functions- string iterator next function
- printing various types
- loaded package replacement
- build an alternate goroot with replacement packages
- replace
runtime
package - force
runtime
to be loaded
- only parse functions actually used
- make sure uses of runtime ops are loaded
- extern func assembly
- per architecture
- loads from asm files
- add runtime library support for builtin ops
- implement mul in go
- implement div in go
- implement rem in go
- implement double and quad word ops
- add/sub should use addc/subc
- shifts should do a function call
- either multi-def support or register pair support
- builtin print/println support
-
proper error logging with a reference to the original code location
-
Documentation especially around the IR package
-
string support
- support len() builtin
- parses
- implemented as a load of the second value in the tuple
- iteration support
- range statements for strings
- parsed
- implemented as an iterator
- string iterator next op
- parsed
- calls an iterator function
- range statements for strings
- indexing strings
- address part of tuple loads
- offset calculated and loaded
- strings are UTF-16
- multi-word UTF-16 character support
- 32-bit runes
- string len() support
- string to slice of bytes conversion
- slice of bytes to string conversion
- string to slice of runes
- slice of runes to string
- utf8 expansion in range operator
- string less comparison
- string equal comparison
- string concatenation
- string slicing / substring
- support len() builtin
-
Add support for a second CPU to make adding more easier
- Artentus' A32
- Add arch abstraction with an rj32 implementation
- Abstract away registers and their use classes
- Add register abstraction with config from rj32 arch
- Allow 32 registers
- Support a different ABI
- Allow sizes to be parameterized by arch
- Add sizes to rj32 arch
- Support 32-bit, 16-bit and 8-bit values
- Support byte addressing
- Fix hard-coded number of arg regs
- Fix arg spill code in elaborating calls
- Fix arg consume code in prologue
- Fix other references to reg.ArgRegs
- Support three operand instructions configured by arch
- Ensure ClobbersArg is false on 3-op arches
- Legalize should not insert copies on 3-op arches
- Instructions for A32 emit 3 operands
- Add support for different cpudefs
- Add support for different emulators
- Add stdlib support for different arches
- Move IO into the standard Library
- Figure out how to utilize the IN/OUT instructions of A32
- Fix panic to write to standard out / IO
- Round robin reg choosing
- Make stack slots word-size aware
- Make stack slots align aware
- Fix customasm producing garbage binary to stdout
- Fix bug where string length is at offset 1 when it should be 4 on LD
- Fix calls getting the function type not the return type
- Fix several bugs with strings and byte addressing
- Add A32 to test suite to make sure it doesn't break
- Artentus' A32
-
Rework IR (ir2)
- Split globals into Literals (strings) and Globals (bss data)
- Split values into Values and Instructions
- Instructions can return multiple values
- Iterators with support for better xforms
- Slab based allocation so pointers are not invalidated
- Fully documented with doc comments
- Blocks no longer have block ops, last instr is control instr
- Proper generation of program-wide unique names for functions, globals and literals
- Make a copy of the
parser
package namedfrontend
- Get it to parse programs to ir2
- Handle translating tuples to multiple return values
- Can emit IR code in a way that simplifies text, assembly and html generation
- Emits to ssa.html
- Block parameters instead of phis
- Values can be used by both Blocks and Instrs
- Blocks and Instrs are Users of Values
- Block parameters are emitted
- params are emitted on block labels
- correct args are passed to successor blocks
- Block parameter defs are parsed
- Terminal instructions specify dest block arguments
- Frontend builds blocks with parameters
- Phis are eliminated
- Implement a simplified type system
- integer types i/u 8,16,32,64
- bool type
- cpu flags
- pointers
- const
- tuples?
- Scan program for types used & print diagnostics
- Map from go types to type system
- Use types in frontend translation to IR
- Output them in textual format as def annotations
- Ability to parse text form of IR back into IR
- Can lex tokens
- Values parsed
- Types parsed
- Types interned to reduce memory
- Array types
- Slice types
- Pointer types
- Func types
- Param lists
- Result lists
- Receivers
- Empty interfaces
- Full interfaces
- Named types
- Type names and looking up the typedef
- Methods
- Struct types
- Map types
- Chan types
- Tuple types?
- Type defs
- Constants parsed
- function references
- literal references
- global references
- numbers
- Instructions parsed
- block labels parsed
- func labels parsed
- parameters, results parsed
- packages parsed
- parses block references and links them
- parses value references and links them
- Global vars
- Global/Func forward references
- Refactor: better package naming
- Generate unique package names
- Identifiers in other packages are pkg_name.ident rather than pkg_name__ident
- Use github.com/nochso/ctxerr for error messages
- Transform xform pkg to use new IR
- Register func with options for config
- grabs name from passed xform func
- tags
- list of active stages
- activating op
- can be run on each instr
- can be run on each block
- takes a func that takes an iterator
- xform func responsible for leaving iterator in a resumable position
- iterator tracks changes
- ability to tell the iterator a change was made it can't see
- A way to run a single transform for testing w/tags
- A way to run a single stage for testing w/tags
- Register func with options for config
-
Rework xform system
- Have a better way to track which rules have run to trigger what code to be generated
- Instrument SSA dump with code location that generated the code
- A way to allow debugging of rule matching conditions inputs outputs and generated code in the ssa.html
- Transformation tagging with the architecture or other things using configuration functions
- Ability to configure the tags for each transformation function close to where the function is defined
- A way to automatically document rules their inputs and outputs their purpose and automatically assign an ID to them and have a way to list documentation about them at the command line
- An architecture can have many tags that are associated with it that can describe which rules are active
- A way to denote which rules/tags were active in each pass for a given architecture in the ssa.html
- Split larger transformations apart so that they are more fine-grained and thus easier to tag for specific architectures
- Have a way to easily run a transformation at the function or block level
- Ability to configure the tags for each transformation function close to where the function is defined
- Have a way to denote how far back in the instruction list to backtrack if a change is made
- Make that back tracking automatic so the rule doesn't have to specify it
- Have a way to specify the dependencies or rules that have to execute first or prerequisites
- Maintain a candidate rule list and as that rule satisfies prerequisites of other rules add those to the candidate list
- This will allow it to automatically figure out the order to apply rules and figure out the phases accordingly
- A way to add rules based on architecture sizes information
- Instead of the rule having a bunch of if conditions to denote that it doesn't work have a way to specify what conditions need to be matching before the rule is ever considered
- Have a way to denote that a rule no longer applies and can be removed from the candidate list
- Automatically track new instructions and new code generated to see if previously rules that have been removed from the candidate list may now apply again
- Have a way to re-trigger a transformation if any changes are made but only if another transformation has it as a prerequisite and before that one is redone
- Have a way to denote that a rule no longer applies and can be removed from the candidate list
- Register allocation as just a regular transformation
- Think about how to add types to the xform engine
- Type verifier that will check to make sure that the return types of ops are correct
- A way to add types to architecture op translation
- Have a better way to track which rules have run to trigger what code to be generated
-
better copy elimination (coalescing)
- add register preference scanning
-
track what registers are actually used by a call site
- reduce the register restrictions (with flag to disable)
-
stack / spill support
-
create spills for temp vars still live at a call
- reload after
-
build way to count register pressure
- add spill support to lower register pressure
-
implement graph coloring in register allocator
- implement adjacency lists for the graph
- nodes can be marked as move related
- nodes can be merged so multiple values are in one node
- nodes can be removed one-by-one
- fast finding of the nodes with least degree
- use iterated register coalescing algo
- implement adjacency lists for the graph
-
expand various ops into calls to functions implementing them
-
stack allocation
- slices
-
heap allocation
- free somehow?
-
slice support
-
closures
-
Add make ready to prepare PRs or whatever
-
Far pointer and code page banking
-
Add notion of extended blocks as groups of blocks without back edges
- fix allocator choosing wrong variable and doing extra copies
- copy propagation
- given use of X:
- are all reaching definitions of X
- copies from the same variable, ie X = copy Y
- where Y is not redefined since that variable?
- if so, substitute use of X with use of Y instead
- are all reaching definitions of X
- given use of X:
- move instruction defs closer to first uses to minimize reg pressure
- constant folding
- common subexpression elimination
- start at entry
- for each block copy value map from predecessors
- ignore preds that have not been visited yet
- for commutative operations, order in ascending order
- hash op + arg1 + arg2
- if already exists in value map, replace with value from that map
- update map with new latest value
- hash and lookup new resulting expression
- if doesn't exist, add to map with the result value
- dead code elimination
- find all loops
- loop invariant code motion
- if for def X, no args refer to a phi node or def inside the loop
- move X out of the loop into the pre-header
- if for def X, no args refer to a phi node or def inside the loop
- find loop induction variables?
- do strength reduction on uses of induction variable?
- like array indexing for example
- do strength reduction on uses of induction variable?
- support for inline assembly / extern assembly
- get mul, div and rem converted to assembly
- loop invariant code motion