Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with min and max methods in dart:math #34090

Closed
julemand101 opened this issue Aug 7, 2018 · 12 comments
Closed

Performance issue with min and max methods in dart:math #34090

julemand101 opened this issue Aug 7, 2018 · 12 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.

Comments

@julemand101
Copy link
Contributor

Dart SDK Version (dart --version)
Problematic version:
Dart VM version: 2.0.0 (Fri Aug 3 10:53:23 2018 +0200) on "windows_x64"

Compared against:
Dart VM version: 1.24.3 (Wed Dec 13 18:05:29 2017) on "windows_x64"

Whether you are using Windows, MacOSX, or Linux (if applicable)
Windows 10 64 bit.

Whether you are using Chrome, Safari, Firefox, Edge (if applicable)
Running as standalone with dart.exe


When running the following code in Dart 2.0.0 and Dart 1.24.3:

import 'dart:math' as math;
main() {
  math.Random rand = new math.Random(1);
  List<int> list = new List.generate(100000000, (_) => rand.nextInt(100000000));
  var t1 = new DateTime.now();
  print(list.reduce(math.max));
  var t2 = new DateTime.now();
  print("Time: ${t2.difference(t1)}");
}

I got the following timings:

Dart 1.24.3 Dart 2.0.0
0:00:00.550652 0:00:24.152452

With the observatory I can see the CPU usage in Dart 2.0 are used on something called DRT_SubtypeCheck:

image

If I modify the code to not use the math.max method:

import 'dart:math' as math;
main() {
  math.Random rand = new math.Random(1);
  List<int> list = new List.generate(100000000, (_) => rand.nextInt(100000000));
  var t1 = new DateTime.now();
  print(list.reduce((a, b) => a > b ? a : b));
  var t2 = new DateTime.now();
  print("Time: ${t2.difference(t1)}");
}

Then I gets the following timings:

Dart 1.24.3 Dart 2.0.0
0:00:00.530667 0:00:00.536129

I should note that the same issue can also be reproduced for math.min.

@mraleph mraleph added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Aug 8, 2018
@mraleph
Copy link
Member

mraleph commented Aug 8, 2018

We are working on this. We need to introduce checked and unchecked entries into closures for this to become faster.

@sjindel-google
Copy link
Contributor

The regression has mostly been addressed. There is still an %18 regression, but it is not grossly unusable as before. I will dig further into the cause of the regression when time permits.

@sjindel-google
Copy link
Contributor

The remaining regression appears to be due to the fact that we aren't optimizing the arithmetic inside max as well as in the local closure, since the parameter types are only known to be num.

@mraleph
Copy link
Member

mraleph commented Sep 6, 2018

@sjindel-google I think it is actually this chunk of code - have special optimization for min/max which only triggers if no type arguments are passed.

We should figure out if we can make that work when type arguments are passed as well.

@aartbik
Copy link
Contributor

aartbik commented Sep 12, 2018

I added a microbenchmark to our internal tracking system.
This is all Dart2, since that is our future now.

JIT

DartMicroBench.MaxLib(RunTime): 10840.28905945946 us.
DartMicroBench.MaxCode(RunTime): 5478.767543715848 us.
DartMicroBench.MinLib(RunTime): 10132.110161616163 us.
DartMicroBench.MinCode(RunTime): 5686.956278409091 us.

AOT

DartMicroBench.MaxLib(RunTime):  22930.94544318182 us.
DartMicroBench.MaxCode(RunTime): 13099.15777124183 us.
DartMicroBench.MinLib(RunTime):  24351.07574698795 us.
DartMicroBench.MinCode(RunTime): 12992.094266233767 us.

@aartbik
Copy link
Contributor

aartbik commented Sep 14, 2018

Difference for the innermost static call to max or > operator between list.reduce(math.max) and list.reduce((a, b) => a > b ? a : b) closures under AOT:

    AssertSubtype:148(T, num, 'T', instantiator_type_args(v0), function_type_args(v41))
    ....
B34[join]:158 pred(B32, B33)
    AssertAssignable:164(v3, T, 'a', instantiator_type_args(v0), function_type_args(v41))
    AssertAssignable:166(v4, T, 'b', instantiator_type_args(v0), function_type_args(v41))
    PushArgument(v41)
    PushArgument(v3)
    PushArgument(v4)
    v45 <- StaticCall:168( max<1> v41, v3, v4)
    Return:170(v45)

vs

B11[join]:42 pred(B6)
    CheckStackOverflow:48()
    DebugStepCheck:50()
    AssertAssignable:52(v3, int, 'a', instantiator_type_args(v0), function_type_args(v0))
    AssertAssignable:54(v4, int, 'b', instantiator_type_args(v0), function_type_args(v0))
    PushArgument(v3)
    PushArgument(v4)
    CheckNull:56(v3)
    v17 <- StaticCall:58( ><0> v3, v4)
    AssertBoolean:60(v17)
    Branch if StrictCompare:62(===, v17, v19) goto (12, 13)

@aartbik
Copy link
Contributor

aartbik commented Sep 26, 2018

For JIT, I prototyped mraleph's idea by inlining MathMinMax even when the type parameter is present. This yields the following intermediate.

B32[target]:154
 AssertSubtype:148(T, num, 'T', instantiator_type_args(v0), function_type_args(v41))
 goto:160 B34
B33[target]:156
 goto:162 B34
B34[join]:158 pred(B32, B33)
 CheckSmi:164(v3)
 AssertAssignable:164(v3 T{_Smi!}, T, 'a', instantiator_type_args(v0), function_type_args(v41)) T{_Smi!}
 CheckSmi:166(v4)
 AssertAssignable:166(v4 T{_Smi!}, T, 'b', instantiator_type_args(v0), function_type_args(v41)) T{_Smi!}
 v45 <- MathMinMax:168(v3 T{_Smi!}, v4 T{_Smi!}) [-4611686018427387904, 4611686018427387903] T{_Smi!}

This increased performance a bit, but not as much as I had hoped. Profiling the generated JIT code reveals that we spent more time in the AssertAssignable code than in the Max code itself. So next we should explore if this can be further simplified.

@mraleph
Copy link
Member

mraleph commented Sep 26, 2018

@aartbik IIRC AssertAssignable and AssertSubtype overheads should go away if we use unchecked entry point - because type system guarantees that we are entering with matching types from statically typed call-sites.

Currently unchecked entry points are disabled (due to some missing stuff in regalloc). You can try --enable_multiple_entrypoints and see if that removes the assert overhead.

@aartbik
Copy link
Contributor

aartbik commented Sep 26, 2018

Yes, with my prototype and unchecked entry points, we see great jump in performance (unchecked entry points standalone helps too, but not as much as the combination). Profiling max_max reveals that in this context, other than data fetching and testing, the time is actually spent doing the max operator.

julemand101 added a commit to julemand101/AdventOfCode2017 that referenced this issue Nov 16, 2018
Dart 2.1 has been released which fixes the problem with this test:
dart-lang/sdk#34090
@julemand101
Copy link
Contributor Author

Well, from my point of view the issue seems to be fixed in Dart 2.1 and I am closing this issue. You are welcome to reopen the issue if you guys thinks there are still more there can be done on this matter.

@aartbik
Copy link
Contributor

aartbik commented Nov 16, 2018

We probably can do a bit better, so I like to keep this open. I removed the medium tag though.

@aartbik aartbik reopened this Nov 16, 2018
@aartbik aartbik removed the P2 A bug or feature request we're likely to work on label Nov 16, 2018
@aartbik
Copy link
Contributor

aartbik commented Dec 3, 2018

With ToT, the prototype of recognizing min/max no longer moves the performance needle either way, so no point pursuing this further. We can indeed close the issue.

@aartbik aartbik closed this as completed Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

4 participants