added design doc for fastalloc algorithm

bytecodealliance · Sep 29, 2024 · 3702c90 · 3702c90
1 parent 5b4f969
commit 3702c90
Show file tree

Hide file tree

Showing 5 changed files with 536 additions and 759 deletions.
diff --git a/README.md b/README.md
@@ -1,13 +1,3 @@
-## fastalloc: a sample implementation of SSRA
-
-In the `RegallocOptions`, setting `use_fastalloc` will run a sample SSRA
-(https://www.mattkeeter.com/blog/2022-10-04-ssra/) implementation.
-
-It only supports registers of class int and it can handle multiple basic
-blocks.
-
-To test it out on a toy language: https://github.com/d-sonuga/reverse-linear-scan-regalloc-concept-2.
-
 ## regalloc2: another register allocator
 
 This is a register allocator that started life as, and is about 50%

diff --git a/doc/FASTALLOC.md b/doc/FASTALLOC.md
@@ -0,0 +1,321 @@
+# Fastalloc Design Overview
+
+Fastalloc is a register allocator made specifically for fast
+compile times. It's based on the reverse linear scan register
+allocation/SSRA algorithm.
+This document describes the data structures used and the allocation steps.
+
+# Data Structures
+
+The main data structures that Fastalloc uses to track its state are
+described below.
+
+## Current VReg Allocations (`vreg_allocs`)
+
+This is a vector that is used to hold the current allocation for every
+VReg during execution.
+
+## VReg Spillslots (`vreg_spillslots`)
+
+Whenever a VReg needs a spillslot, a dedicated slot is allocated for it.
+This vector is where all VReg's spillslots are stored.
+
+## Live VRegs (`live_vregs`)
+
+Live VReg information is kept in a `VRegSet`, a doubly linked list
+based on a vector. This is used for quick insertion, removal, and
+iteration.
+
+## Least Recently Used Caches (`lrus`)
+
+Every register class (int, float, and vector) has its own LRU and they
+are stored together in an array: `lrus`. An LRU is represented similarly
+to a `VRegSet`: it's a circular, doubly-linked list based on a vector.
+
+The last PReg in an LRU is the least-recently allocated PReg:
+
+most recently used PReg (head) -> 2nd MRU PReg -> ... -> LRU PReg
+
+## Current VReg In PReg Info (`vreg_in_preg`)
+
+During allocation, it's necessary to determine which VReg is in a PReg
+to generate the right move(s) for eviction.
+`vreg_in_preg` is a vector that stores this information.
+
+## Available PRegs For Use In Instruction (`available_pregs`)
+
+This is a 2-tuple of `PRegSet`s, a bitset of physical registers, one for
+the instruction's early phase and one for the late phase.
+They are used to determine which registers are available for use in the
+early/late phases of an instruction.
+
+Prior to the beginning of any instruction's allocation, this set is reset
+to include all allocatable physical registers, some of which may already
+contain a VReg.
+
+## VReg Liverange Location Info (`vreg_to_live_inst_range`)
+
+This is a vector of 3-tuples containing the beginning and the end
+of all VReg's liveranges, along with an allocation they are guaranteed
+to be in throughout that liverange.
+This is used to build the debug locations vector after allocation
+is complete.
+
+# Allocation Process Breakdown
+
+Allocation proceeds in reverse: from the last block to the first block,
+and in each block: from the last instruction to the first instruction.
+
+The allocation for each operand in an instruction can be viewed to happen
+in four phases: selection, assignment, eviction, and edit insertion.
+
+## Allocation Phase: Selection
+
+In this phase, a PReg is selected from `available_pregs` for the 
+operand based on the operand constraints. Depending on the operand's 
+position the selected PReg is removed from either the early or late 
+phase or both, indicating that the PReg is no longer available for 
+allocation by other operands in that phase.
+
+## Allocation Phase: Assignment
+
+In this phase, the selected PReg is set as the allocation for 
+the operand in the final output.
+
+## Allocation Phase: Eviction
+
+In this phase, the previous VReg in the allocation assigned to 
+an operand is evicted, if any.
+
+During eviction, a dedicated spillslot is allocated for the evicted 
+VReg and an edit is inserted after the instruction to move from the
+slot to the allocation it's expected to be in after the instruction.
+
+## Allocation Phase: Edit Insertion
+
+In this phase, edits are inserted to ensure that the dataflow from
+before the instruction to the selected allocation to after
+the instruction remain correct.
+
+# Invariants
+
+Some invariants that remain true throughout execution:
+
+1. During processing, the allocation of a VReg at any point in time
+as indicated in `vreg_allocs` changes exactly twice or thrice.
+Initially it is set to none. When it's allocated, it is
+changed to that allocation. After this, it doesn't change unless 
+it's evicted or spilled across a block boundary;
+if it is, then its current allocation will change to its dedicated 
+spillslot. After this, it doesn't change again until it's definition 
+is reached and it's deallocated, during which its `vreg_allocs` 
+entry is set to none. The only exception is block parameters that 
+are never used: these are never allocated.
+
+2. A virtual register that outlives the block it was defined in will 
+be in its dedicated spillslot by the end of the block.
+
+3. At the end of a block, before edits are inserted to move values 
+from branch arguments to block parameters spillslots, all branch 
+arguments will be in their dedicated spillslots.
+
+4. At the beginning of a block, all branch parameters and livein 
+virtual registers will be in their dedicated spillslots.
+
+# Instruction Allocation
+
+To allocate a single instruction, the first step is to reset the
+`available_pregs` sets to all allocated PRegs.
+
+Next, the selection phase is carried out for all operands with
+fixed register constraints: the registers they are constrained to use are
+marked as unavailable in the `available_pregs` set, depending on the
+phase that they are valid in. If the operand is an early use or late
+def operand, then the register will be marked as unavailable in the
+early set or late set, respectively. Otherwise, the PReg is marked
+as unavailable in both the early and late sets, because a PReg
+assigned to an early def or late use operand cannot be reused by another
+operand in the same instruction.
+
+After selection for fixed register operands, the eviction phase is 
+carried out for fixed register operands. Any VReg in their selected
+registers, indicated by `vreg_in_preg`, is evicted: a dedicated 
+spillslot is allocated for the VReg (if it doesn't have one already),
+an edit is inserted to move from the slot to the PReg, which is where
+the VReg expected to be after the instruction, and its current
+allocation in `vreg_allocs` is set to the spillslot.
+
+Next, all clobbers are removed from the early and late `available_pregs` 
+sets to avoid allocating a clobber to a def.
+
+Next, the selection, assignment, eviction, and edit insertion phases are 
+carried out for all def operands. When each def operand's allocation is
+complete, the def operands is immediately freed, marking the end of the
+VReg's liverange. It is removed from the  `live_vregs` set, its allocation
+in `vreg_allocs` is set to none, and if it was in a PReg, that PReg's
+entry in `vreg_in_preg` is set to none. The selection and eviction phases
+are omitted if the operand has a fixed constraint, as those phases have
+already been carried out.
+
+Next, the selection, assignment, and eviction phases are carried out for all
+use operands. As with def operands, the selection and eviction phases are 
+omitted if the operand has a fixed constraint, as those phases have already
+been carried out.
+
+Then the edit insertion phase is carried out for all use operands.
+
+Lastly, if the instruction being processed is a branch instruction, the
+parallel move resolver is used to insert edits before the instruction
+to move from the branch arguments spillslots to the block parameter
+spillslots.
+
+## Operand Allocation
+
+During the allocation of an operand, a check is first made to 
+see if the VReg's current allocation as indicated in 
+`vreg_allocs` is within the operand constraints.
+
+If it is, the assignment phase is carried out, setting the final
+allocation output's entry for that operand to the allocation.
+The selection phase is carried out, marking the PReg 
+(if the allocation is a PReg) as unavailable in the respective
+early/late sets. The state of the LRUs is also updated to reflect 
+the new most recently used PReg.
+No eviction needs to be done since the VReg is already in the 
+allocation and no edit insertion needs to be done either.
+
+On the other hand, if the VReg's current allocation is not within
+constraints, the selection and eviction phases are carried out for
+non-fixed operands. First, a set of PRegs that can be drawn from is
+created from `available_pregs`. For early uses and late defs,
+this draw-from set is the early set or late set respectively.
+For late uses and early defs, the draw-from set is an intersection
+of the available early and late sets (because a PReg used for a late
+use can't be reassigned to another operand in the early phase;
+likewise, a PReg used for an early def can't be reassigned to another
+operand in the late phase).
+The LRU for the VReg's regclass is then traversed from the end to find
+the least-recently used PReg in the draw-from set. Once a PReg is found,
+it is marked as the most recently used in the LRU, unavailable in the
+`available_pregs` sets, and whatever VReg was in it before is evicted.
+
+The assignment phase is carried out next: the final allocation for the
+operand is set to the selected register.
+
+If the newly allocated operand has not been allocated before, that is,
+this is the first use/def of the VReg encountered, the VReg is
+inserted into `live_vregs` and marked as the value in the allocated
+PReg in `vreg_in_preg`.
+
+Otherwise, if the VReg has been allocated before, then an edit will need
+to be inserted to ensure that the dataflow remains correct.
+The edit insertion phase is now carried out if the operand is a def
+operand: an edit is inserted after the instruction to move from the
+new allocation to the allocation it's expected to be in after the
+instruction.
+
+The edit insertion phase for use operands is done after all operands
+have been processed. Edits are inserted to move from the current
+allocations in `vreg_allocs` to the final allocated position before
+the instruction. This is to account for the possibility of multiple
+uses of the same operand in the instruction.
+
+## Reuse Operands
+
+Reuse def operands are handled by creating a new operand identical to the
+reuse def, except that its constraints are the constraints of the
+reused input and allocating that in its place.
+
+Reused inputs are handled by creating a new operand with a fixed register
+constraint to use whatever register was assigned to the reuse def.
+
+Because of the way reuse operands and reused inputs are handled, when
+selecting a register for an early use operand with a fixed constraint,
+the PReg is also marked as unavailable in the `available_pregs` late 
+set if the operand is a reused input. And when selecting a register 
+for reuse def operands, the selected register is marked as unavailable 
+in the `available_pregs` early set.
+
+## VReg Spillslots
+
+Whenever a VReg needs a spillslot, a suitable one is allocated and
+marked as the VReg's dedicated spillslot in `vreg_spillslots`.
+If a VReg never needs a spillslot, none is allocated for it.
+To ensure that a VReg will always be in its spillslot when expected,
+during the processing of a def operand, before it's deallocated,
+an edit is inserted to move from its current allocation as indicated
+in `vreg_allocs` to its dedicated spillslot, if one is present in
+`vreg_spillslots`.
+
+## Branch Instructions
+
+As an invariant, all branch arguments will be in their dedicated
+spillslots at the end of the block before edits are inserted to
+move from those spillslots to the block parameter spillslots
+of the successor blocks.
+
+If a branch argument is already in an allocation that isn't
+its spillslot (this could happen if the branch argument is used
+as an operand in the same instruction, because all normal
+instruction processing is completed before branch-specific
+processing), then an edit is inserted
+to move from the spillslot to that allocation and its current
+allocation in `vreg_allocs` is set to the spillslot.
+
+It's after these edits have been inserted that the parallel move
+resolver is then used to generate and insert edits to move from
+those spillslots to the spillslots of the block parameters.
+
+# Across Blocks
+
+When a block completes processing, some VRegs will still be live.
+These VRegs are either block parameters or livein VRegs.
+As an invariant, prior to the first instruction in a block, all
+block parameters and livein VRegs will be in their dedicated spillslots.
+
+To maintain this invariant, after a block completes processing, edits
+are inserted at the beginning of the block to move from the block
+parameter and livein spillslots to the allocation they are expected
+to be in from the first instruction.
+All block parameters are freed, just like defs, and liveins' current
+allocations in `vreg_allocs` are set to their spillslots.
+
+# Edits Order
+
+`regalloc2`'s outward interface guarantees that edits are in
+sorted order. Since allocation proceeds in reverse, all edits
+are also added in reverse. After all blocks have completed
+processing the edits are simply reversed to put it in the
+correct order.
+
+One of the reasons why the allocation order proceeds the way it
+does is because of this edit-order constraint. All edits that
+occur after the instruction must be inserted before all edits
+that occur before the instruction.
+
+# Debug Info
+
+After all blocks have completed processing, the debug locations
+vector is built.
+The information it's built from is assembled from liverange info 
+that is tracked throughout the allocation.
+Whenever a VReg is allocated for the first time, its liverange end
+is saved in the VReg's slot in the `vreg_to_live_inst_range`
+vector. Whenever a VReg's definition is encountered, its liverange
+beginning is saved, too. And the allocation it will be in
+throughout that range is also saved alongside.
+
+To determine the allocation the VReg will be in throughout the 
+liverange, the first invariant is used: the first time a VReg
+is allocated, its current allocation in `vreg_allocs` doesn't
+change unless its evicted or spilled across block boundaries.
+Using this info, if by the time the def of a VReg is allocated,
+that VReg has no dedicated spillslot,
+that implies that the VReg was never evicted or spilled, so whatever
+value its `vreg_allocs` entry says is the location it will be in
+throughout its liverange. Otherwise, if it has a spillslot
+allocated to it, that implies that the VReg was either evicted
+at some point or it was a livein of a predecessor or a block parameter.
+Either way, since all spillslots are dedicated to their respective VRegs,
+it is safe to record the spillslot as the allocation for the
+`vreg_to_live_inst_range` info.