-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler Support for Debug On Restore #18866
Comments
fyi @tajila @vijaysun-omr |
"The IProfiler needs to be enabled during startup even if an SCC exists..." ... I assume you mean before checkpoint ? i.e. I assume we can use the SCC after restore and therefore won't want to pay the cost of IProfiler in the post-restore period until first response (or whatever stage in startup process iprofiling is off for) is done. |
Yeah that's right, that's why part of the heuristic work is to "Turn off IProfiler post-restore so it can be naturally enabled later". |
I think we may be able to avoid having to worry about this. We can still do all the proactive compilation, but we don't have to update the |
The previous comment has the caveat that the cost of delaying the updating of the extra field should not impact startup/first response; if doing so is too expensive then we will have to default to the original design. |
After speaking with @mpirvu and @vijaysun-omr there is another option that really only became apparent after all of the work we did to get the compiler support to the point it is now. The main idea is to run interpreted pre-checkpoint, but use all the heuristics described in this issue to ensure that the environment post-restore in terms of JIT'd code, IProfiler, SCC Validation, etc. are all the same. Furthermore, depending on the evaluation of #18866 (comment), it may even be possible to delay updating the The benefits of doing so are
The drawbacks are
Because of all the infrastructure that I already have in place in my branches, I don't think it should be all that much work to evaluate this idea, so I'll update once I have more information. |
fyi @gacholio |
Some observations from my first set of experiments, with the latest changes I have based on open PRs: The current build involves generating FSD code pre-checkpoint, doing a set of (non-FSD) proactive compilation in the checkpoint hook, and then triggering recompilation of FSD methods post-restore. This seems to have minimal impact on startup and firstreponse (~1-2%) but a very large increase in footprint. Part of this is due to the increase in code cache and data cache, but also an increase in the SCC RSS. I need to do a bit more investigation on this, but it's very likely due to the recompilations post-restore. Essentially, the process of triggering a recompilation will result in either the requesting thread or the comp thread touching the J9ROMClass at some point, which will bring the page back into the RSS. The reason I think this is likely is because when I moved the recompilations to occur pre-checkpoint, the RSS of the SCC went down. However, I haven't validated if this is consistent, so I don't want to confirm this just yet. Additionally, the code/data cache and internal memory is higher because now we have both FSD bodies and the recompiled non-FSD bodies. I also tried running interpreted pre-checkpoint, but performing (non-FSD) compilations in the background, and then updating the relevant Finally, it became apparent that delaying updating the Given the following two observations:
Generating FSD code pre-checkpoint only to then also generate the recompiled non-FSD version pre-checkpoint seems wasteful; it increases the code/data cache and jit persistent memory usage. However, the benefit is that the pre-checkpoint run-time of the application is minimally impacted. Only running interpreted pre-checkpoint prevents the complication of having both FSD and non-FSD code, and facilitates the means by which footprint should match baseline. However, it has the issue that the pre-checkpoint run-time is significantly impacted. I should also note that I have yet to evaluate the throughput implications of all of this. Footnotes
|
Thanks for the summary. A few factors that you bring up, but as a matter of opinion, I don't feel are very significant, i.e. we could choose to lower their importance to get to the final design point.
I guess I am arguing in favor of having FSD and non-FSD bodies before checkpoint, rather than run interpreted until then. |
All FSD bodies can be reclaimed as we will be decompiling any that are on stack at restore. You could even do it aggressively on restore after the decompilations have been added to the appropriate stack frames. |
This issue specifically tracks the Compiler support for Debug on Restore.
Parent issue: #17642
Approach
See #17642 for the discussion.
Implementation
In order to balance the constraints of startup/first response, footprint, and throughput, there are several additional pieces that need to be put in place.
VM Coordination
Given the above compiler implementation details, there are three main pieces of VM coordination that has be implemented in order to ensure functional correctness
jitConfig->jitClassesRedefined
. This is because the FSD bodies do not have OSR guards (i.e. Voluntary OSR); they depend on the VM triggering the transition (i.e. Involuntary OSR). GAC is aware of this consideration in CRIU: Add support for dynamic debug interpreter transition on restore #17642.PRs
-XX:+DebugOnRestore
#19342-XX:+DebugOnRestore
#20000-XX:+DebugOnRestore
#20047VM
Regarding Point 3 in the VM Coordination section, the current plan is a two pronged approach: 1. the VM will reset the necessary bits to allow J9HookDisable to function correctly post-restore, and 2. the JIT will not invoke the J9HookDisable calls (at least for FSD) until post-restore under
-XX:+DebugOnRestore
.The text was updated successfully, but these errors were encountered: