-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine tune method inlining of static calls in AOT #32167
Comments
Adding feedback from Alex here so we remember. "The problem is that if we have compiled a function already, we have recorded number of instructions after optimizations. If we haven't compiled a function being inlined, then inliner builds graph and does a very limited set of optimizations before calculating number of instructions. This makes inliner heuristics weaker if we haven't compiled function yet. Maybe we can try compiling a function before inlining it to calculate inline weights for always-inline heuristic. |
Rationale: Without proper execution counters, the inline AOT inliner marks every call site "cold", effectively disabling inlining altogether. This change introduces loop-based static heuristic that assumes statements nested inside loops are executed more frequently. This results in more inlining. Note: Conservative version is used for now which yields more performance without increasing code size too much. There is still a lot of performance left at the table which we could exploit if we fine tune heuristics regarding code size. Bug: #34473 #32167 Change-Id: I86ba60f93bdab363cd22ab6bdbcf6688f2042fea Reviewed-on: https://dart-review.googlesource.com/c/81187 Commit-Queue: Aart Bik <[email protected]> Reviewed-by: Alexander Markov <[email protected]>
This reverts commit daae20d. Reason for revert: The kernel-precomp bots are seeing this error: ../../runtime/vm/object.h: 3134: error: Handle check failed: saw 2249186640 expected Function Not sure what that is yet, but reverting to get bots green again while I investigate. Original change's description: > [vm/compiler] Use loop framework for AOT inline heuristics > > Rationale: > Without proper execution counters, the inline AOT inliner > marks every call site "cold", effectively disabling inlining > altogether. This change introduces loop-based static heuristic > that assumes statements nested inside loops are executed more > frequently. This results in more inlining. > > Note: > Conservative version is used for now which yields > more performance without increasing code size too much. > There is still a lot of performance left at the table > which we could exploit if we fine tune heuristics > regarding code size. > > Bug: > #34473 > #32167 > > > Change-Id: I86ba60f93bdab363cd22ab6bdbcf6688f2042fea > Reviewed-on: https://dart-review.googlesource.com/c/81187 > Commit-Queue: Aart Bik <[email protected]> > Reviewed-by: Alexander Markov <[email protected]> [email protected],[email protected],[email protected] Change-Id: If5ca82966966ebef4ec0b4e921515d23f6bd492b No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/81335 Reviewed-by: Aart Bik <[email protected]> Commit-Queue: Aart Bik <[email protected]>
The new heuristic exposed an issue with more inlining.
This needs to be addressed before we can tune the heuristic properly. |
I did a bisection search on this failure, and found the offending inlining:
Which is interesting, since it is not enabled by the new heuristic (ratio is 0.0). So another inlining must somehow interact badly with this one. Looking further. |
With a bisection search on just the inlining based on (the new) nonzero ratio, we arrive at the same culprit, but the cause-and-effect is more clear. We set the ratio of the following call to 1.0:
which in turn allows the inlining of this call site in into
This in turn allows for inlining of the
call I already found above.
|
In this method, we have three call sites that are inlined in a very similar way. for (int i = 0; i < result.length; i++) {
assert(i % 2 == 0);
switch (result[i++]) {
case listFile:
controller.add(new File.fromRawPath(result[i]));
break;
case listDirectory:
controller.add(new Directory.fromRawPath(result[i]));
break;
case listLink:
controller.add(new Link.fromRawPath(result[i]));
break; For example, the File call site
becomes
But in the offending inlining of Link, the call site
becomes (note the _Link@17069316 instead of _File@17069316):
The only difference I noticed is that File and Directory have a @pragma("vm:entry-point") on the fromRawPath method, and Link does not. Adding this pragma to Link actually makes the test pass. @pragma("vm:entry-point")
factory Link.fromRawPath(Uint8List rawPath) {
// TODO(bkonyi): handle overrides
return new _Link.fromRawPath(rawPath);
} @mraleph does that make sense? |
@aartbik I think it makes sense if the test is testing directory listing or something like that. If we inline all invocations of @sjindel-google actually discovered a lot of missing entry-point annotations in his obfuscation fixing CL (see https://dart-review.googlesource.com/c/sdk/+/81009/17/sdk/lib/io/link.dart ). And he also looked into strengthening checks around entry-points. |
Cool. I verified that all failures with the bots yesterday go away with this one line change. So I will send out a separate CL for that, since it makes File/Directory/Link equivalent again, and unblocks my inlining heuristic CL. I assume @sjindel-google will follow up with other cases. |
Rationale: This makes File/Directory/Link's fromRawPath equivalent again, and unblocks the pending loop-based inlining heuristic. Reverted CL because of this bug: https://dart-review.googlesource.com/c/sdk/+/81335 #34473 #32167 Change-Id: I769a8365ea32d6f6830d0dd8ef97fc6c2467fb90 Reviewed-on: https://dart-review.googlesource.com/c/81460 Reviewed-by: Vyacheslav Egorov <[email protected]> Commit-Queue: Aart Bik <[email protected]>
This reverts commit 0170b8d. Reason for revert: The underlying problem has been fixed. Original change's description: > Revert "[vm/compiler] Use loop framework for AOT inline heuristics" > > This reverts commit daae20d. > > Reason for revert: > > The kernel-precomp bots are seeing this error: > > ../../runtime/vm/object.h: 3134: error: Handle check failed: saw 2249186640 expected Function > > Not sure what that is yet, but reverting to get bots green again while I investigate. > > > Original change's description: > > [vm/compiler] Use loop framework for AOT inline heuristics > > > > Rationale: > > Without proper execution counters, the inline AOT inliner > > marks every call site "cold", effectively disabling inlining > > altogether. This change introduces loop-based static heuristic > > that assumes statements nested inside loops are executed more > > frequently. This results in more inlining. > > > > Note: > > Conservative version is used for now which yields > > more performance without increasing code size too much. > > There is still a lot of performance left at the table > > which we could exploit if we fine tune heuristics > > regarding code size. > > > > Bug: > > #34473 > > #32167 > > > > > > Change-Id: I86ba60f93bdab363cd22ab6bdbcf6688f2042fea > > Reviewed-on: https://dart-review.googlesource.com/c/81187 > > Commit-Queue: Aart Bik <[email protected]> > > Reviewed-by: Alexander Markov <[email protected]> > > [email protected],[email protected],[email protected] > > Change-Id: If5ca82966966ebef4ec0b4e921515d23f6bd492b > No-Presubmit: true > No-Tree-Checks: true > No-Try: true > Reviewed-on: https://dart-review.googlesource.com/c/81335 > Reviewed-by: Aart Bik <[email protected]> > Commit-Queue: Aart Bik <[email protected]> [email protected],[email protected],[email protected] Change-Id: I8c160f59f8e91782f235cbcea607aef4227963f4 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/81500 Reviewed-by: Aart Bik <[email protected]> Commit-Queue: Aart Bik <[email protected]>
Rationale: Last experiment left some performance at the table (but for an unacceptable 200% code size increase). This CL regains that performance for a much better trade-off of just 8%. Performance gains vary from 10% to 100%, with e.g. 27% for DeltaBlue, 30% for MD5 and 100% for SHA256). This is obtained by always inlining byteswaps (to avoid calling into the runtime when getInts remain) and a bit more specific tuning of loop based heuristics. This also fixes a bug in a list lookup. #34473 #34969 #32167 Change-Id: Id1a64c3930c503546ae2d7f31ca3e597741022bb Reviewed-on: https://dart-review.googlesource.com/c/82942 Reviewed-by: Ryan Macnak <[email protected]> Commit-Queue: Aart Bik <[email protected]>
This reverts commit 74d2c6a. Reason for revert: ../../runtime/vm/compiler/backend/il.cc: 778: error: unreachable code Original change's description: > [vm/compiler] Tune AOT inline heuristics > > Rationale: > Last experiment left some performance at the table > (but for an unacceptable 200% code size increase). > This CL regains that performance for a much better > trade-off of just 8%. Performance gains vary from > 10% to 100%, with e.g. 27% for DeltaBlue, 30% for > MD5 and 100% for SHA256). This is obtained by > always inlining byteswaps (to avoid calling into > the runtime when getInts remain) and a bit more > specific tuning of loop based heuristics. This > also fixes a bug in a list lookup. > > #34473 > #34969 > #32167 > > Change-Id: Id1a64c3930c503546ae2d7f31ca3e597741022bb > Reviewed-on: https://dart-review.googlesource.com/c/82942 > Reviewed-by: Ryan Macnak <[email protected]> > Commit-Queue: Aart Bik <[email protected]> [email protected],[email protected],[email protected],[email protected],[email protected] Change-Id: Id0f30f4458a3548af2e573a737e441ca11ae3b11 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/83643 Reviewed-by: Aart Bik <[email protected]> Commit-Queue: Aart Bik <[email protected]>
Rationale: Although not semantically disastrous, list inclusion was not properly implemented, which yielded some very hard to debug recompilations (or lack thereof) due to speculation. #34473 #34969 #32167 Change-Id: I54613127bd3ab93820d4a467c75e41f3df16ee7a Reviewed-on: https://dart-review.googlesource.com/c/83883 Reviewed-by: Siva Annamalai <[email protected]> Commit-Queue: Aart Bik <[email protected]>
Rationale: Always inlining byteswaps avoids calling into the runtime when getInts remain. #34473 #34969 #32167 Change-Id: I95c0348fa960e08f973a12a2c5304785e76289b3 Reviewed-on: https://dart-review.googlesource.com/c/83884 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Aart Bik <[email protected]>
Rationale: Yields substantial improvements on various benchmarks (1.8x on HMAC stand-alone, around 5x on TypedData settters and getters), with only moderate increase in code size (3.2% on Flutter gallery). #34473 #32167 Change-Id: I0909efd7afc72229524ff8edb7322ce025a14af4 Reviewed-on: https://dart-review.googlesource.com/c/89162 Reviewed-by: Vyacheslav Egorov <[email protected]> Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Aart Bik <[email protected]>
Closing this issue, since we went from never inlining static calls to using a reasonably effective heuristics for inlining static calls under AOT (although fine-tuning these is expected to continue of course). |
This reverts commit 2908e61. Reason for revert: regress_32322_test crashes Original change's description: > [vm/compiler] AOT inline heuristics improvements > > Rationale: > Yields substantial improvements on various benchmarks > (1.8x on HMAC stand-alone, around 5x on TypedData settters and getters), > with only moderate increase in code size (3.2% on Flutter gallery). > > #34473 > #32167 > > Change-Id: I0909efd7afc72229524ff8edb7322ce025a14af4 > Reviewed-on: https://dart-review.googlesource.com/c/89162 > Reviewed-by: Vyacheslav Egorov <[email protected]> > Reviewed-by: Alexander Markov <[email protected]> > Commit-Queue: Aart Bik <[email protected]> [email protected],[email protected],[email protected] Change-Id: I9c7dadb18935ad32f4d4cd72872838e8ac9cc288 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/89740 Reviewed-by: Aart Bik <[email protected]> Commit-Queue: Aart Bik <[email protected]>
Rationale: Previously, we avoided introducing redefinitions that introduced the empty non-nullable null type. This situation arises when we do a null check on an actual null value (making all subsequent uses effectively dead code). This is too simple, however, since it still allows hoisting the uses before the check. This CL gives a better solution by introducing redefinitions without a constraining type (which are not removed and avoid the type). In the long run perhaps the best solution would be to simply remove all subsequent uses as dead. #32167 #34473 #35335 Change-Id: Ib5dd072a9e546f6b91faa52ea08e8c0f6350d7e0 Reviewed-on: https://dart-review.googlesource.com/c/89922 Reviewed-by: Alexander Markov <[email protected]> Commit-Queue: Aart Bik <[email protected]>
This reverts commit 43a96d4. Reason for revert: <INSERT REASONING HERE> Original change's description: > Revert "[vm/compiler] AOT inline heuristics improvements" > > This reverts commit 2908e61. > > Reason for revert: regress_32322_test crashes > > Original change's description: > > [vm/compiler] AOT inline heuristics improvements > > > > Rationale: > > Yields substantial improvements on various benchmarks > > (1.8x on HMAC stand-alone, around 5x on TypedData settters and getters), > > with only moderate increase in code size (3.2% on Flutter gallery). > > > > #34473 > > #32167 > > > > Change-Id: I0909efd7afc72229524ff8edb7322ce025a14af4 > > Reviewed-on: https://dart-review.googlesource.com/c/89162 > > Reviewed-by: Vyacheslav Egorov <[email protected]> > > Reviewed-by: Alexander Markov <[email protected]> > > Commit-Queue: Aart Bik <[email protected]> > > [email protected],[email protected],[email protected] > > Change-Id: I9c7dadb18935ad32f4d4cd72872838e8ac9cc288 > No-Presubmit: true > No-Tree-Checks: true > No-Try: true > Reviewed-on: https://dart-review.googlesource.com/c/89740 > Reviewed-by: Aart Bik <[email protected]> > Commit-Queue: Aart Bik <[email protected]> [email protected],[email protected],[email protected] # Not skipping CQ checks because original CL landed > 1 day ago. Change-Id: Iace9857654b63af2fbcd2808d19802fb60305973 Reviewed-on: https://dart-review.googlesource.com/c/90141 Reviewed-by: Aart Bik <[email protected]> Commit-Queue: Aart Bik <[email protected]>
A lot of instance calls are converted to static calls by CHA-based devirtualization / global type flow analysis. However, they are rarely inlined.
For example, consider the following code (extracted from velocity_tracker Flutter microbenchmark):
In this code, _Vector.[] is not inlined into _Vector.*.
--trace-inlining shows the following:
The heuristic which bails out is based on StaticCallInfo.ratio which is calculated using call counters (which don't make much sense in AOT).
Also, this is very different from instance calls: InstanceCallInfo.ratio is calculated but it is not taken into account, and there is no similar heuristic for instance calls.
To summarize, the inlining heuristics should be fine tuned for static calls to allow inlining of small methods.
/cc @mraleph
The text was updated successfully, but these errors were encountered: