First, let's cite the 'Comment' of the class Cogit
:
I am the code generator for the Cog VM. My job is to produce machine code versions of methods for faster execution and to manage inline caches for faster send performance. [...]
I have concrete subclasses that implement different levels of optimization:
SimpleStackBasedCogit is the simplest code generator.
StackToRegisterMappingCogit is the current production code generator It defers pushing operands
to the stack until necessary and implements a register-based calling convention for low-arity sends.
So basically Cogit is the JIT compiler for the PharoVM. And there are mainly two flavours of it:
- SimpleStack
- StackToRegister
The production one, as the citted comment says, is the StackToRegister. This one is a more optimizing code generator mainly because it tries to minimize the stack usage (and hence the memory accesses) by moving everything it can to the processor's registers.
In the SimpleStack, each bytecode is independent of each other and they only communicate by the stack. This means that when they want to pass arguments to each other they know they'll have to push them to the stack and then the 'called' bytecode will know that it must pop them. So, you can imagine that in a method jitcompiled by the SimpleStack with a significant number of bytecodes will have quite a lot of pushes and pops.
Let's explore this through a concrete example:
Suppose we have a Pharo method that looks like this:
foo: x
^ x bar: self
The generated bytecode for foo
would be something like this:
<64> pushTemp: 0
<76> self
<144> send: bar:
<92> returnTop
If you are already familiar with the PharoVM bytecodes feel free to skip this section. Here I'll explain what each of those bytecodes mean:
pushTemp: 0
: this bytecode will push the 'temporary variable' number 0 into the stack. That variable will be the first (and only) argument of the method. In our case it would be x
.
self
: it will push the receiver of the message foo
, so it will be self
. This bytecode can be read as pushSelf
.
send: m:
: this is a message send with one argument. Here we are just sending the message m
to the receiver, which was pushed first: x
.
Remember the order at which things get pushed:
First, the receiver is pushed.
Then, the arguments of the message, one by one.
And then it's the message send.
returnTop
: It returns whatever it's a the top of the stack. We expect that whatever method was activated because of the message send, that method pushed its result.
If we give these bytecodes to the SimpleStackCogit (1), it would generate code that looks like this:
push Temp:0
push self
r0 := pop
r1 := pop
r0 := BAR r0 r1
push r0
r1 := pop
retTop
This is not valid Cogit code (not even IR), it's just pseudo-code.
Suppose that BAR
is some (IR-)instruction that behaves exactly as the bar:
method.
You can think of it as the jitted version of the bar:
method.
What's important is that takes two parameters and returns a result.
Remember that here we are in machine code land, so we only have the IR registers and its intructions to operate. So, of course that if we want to execute some instruction we will need to move its arguments to registers.
As you already know, all the bytecodes operate on the stack. Arguments are pushed and poped from it. So, it makes sense that if we want to convert those bytecode to machine code, we will need to move some values to registers.
[This part may be a little confusing: Even in the SimpleStackCogit there will be operations on registers, to operate on a processor we do so by using the registers!
For example the hipothetical BAR
instruction takes two registers, we can't give it two memory addresses to operate.
For a real example, see the add
instruction in x86 !manual[https://www.felixcloutier.com/x86/add] . One of the parameters can be memory but not both. One of them must be a register.
]
As you can see, there a quite a lot of pushes and pops, even worse, if you look closely you can see that they are redundant. We are pushing a value, just to pop it into a register one line below.
Exactly that is what the StackToRegisterMapping realizes and then tries to minimize those stack operations.
Again, let's refer to the 'Comment' of the class StackToRegisterMappingCogit
:
StackToRegisterMappingCogit is an optimizing code generator that eliminates a lot of stack operations and inlines some special selector arithmetic. It does so by a simple stack-to-register mapping scheme based on deferring the generation of code to produce operands until operand-consuming operations. The operations that consume operands are sends, stores and returns.
Essentially what the StackToRegister will do is have a 'simulated stack' where it will try to do all the stack operations. Then, instead of generating the push
and pop
instructions, it will resolve those pushes and pops in compile-time using that 'simulated stack'.
You can think of it as it tries to move all the stack operations from run-time to compile-time.
The SimpleStack will generate the instructions for all those pushes and pops, so it will be the real processor who has to execute them. Instead, in the StackToRegister is the compiler itself who executes them.
The StackToRegister will use its 'simulated stack' to see what pushes (and their respective pops) it can avoid.
Here is a high-level oversimplified explanation of how it does it:
It will receive a code like this:
push Temp:0
push self
r0 := pop
r1 := pop
r0 := BAR r0 r1
push r0
r1 := pop
retTop
So, the pushes it will do them in its 'simulated stack'. But then, it sees that the code pops those values to save them to registers, so the StackToRegister realizes this and then just generates code like this:
...
r0 := Temp:0
r1 := self
...
Then it sees the BAR, for it generates code just like before.
Finally, it sees another push with a following pop to a register, it optimizes it again. The final generated code would look something like this:
r0 := Temp:0
r1 := self
r0 := BAR r0 r1
r1 := r0
retTop
push c1 push c2 r0 := pop r1 := pop r0 := DIV r0 r1 push r0 ro := 18 r1 := pop retTop
(1) Of course we do not give to the Cogit the bytecodes directly. The bytecodes are inside of what's is called a CompiledMethod
. This object will have not only the bytecodes but also the literals and general information about the method. The Cogit will work with CompiledMethod
s. When it jitcompiles a CompiledMethod
essentially what's doing is jitcompiling each of its bytecodes.