Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiler using measureme #317

Merged
merged 15 commits into from
Jun 5, 2020
Merged

Profiler using measureme #317

merged 15 commits into from
Jun 5, 2020

Conversation

jasonwilliams
Copy link
Member

@jasonwilliams jasonwilliams commented Apr 13, 2020

@jasonwilliams jasonwilliams changed the title making a start Profiler using measureme Apr 13, 2020
@Razican Razican added this to the v0.9.0 milestone May 14, 2020
@Razican Razican added enhancement New feature or request performance Performance related changes and issues labels May 14, 2020
boa/src/lib.rs Outdated Show resolved Hide resolved
boa/src/lib.rs Outdated Show resolved Hide resolved
@jasonwilliams
Copy link
Member Author

jasonwilliams commented May 18, 2020

@Razican @HalidOdat There's some great progress happening with this.
image

@Razican
Copy link
Member

Razican commented May 18, 2020

@Razican @HalidOdat There's some great progress happening with this.
image

I'm loving this. This should give nice insights on where do we spend our time.

@abhijeetbhagat
Copy link
Contributor

Is that flamegraph displayed in a Chrome devtools window?

@Razican
Copy link
Member

Razican commented May 23, 2020

Is that flamegraph displayed in a Chrome devtools window?

Yep! :)

@abhijeetbhagat
Copy link
Contributor

Ok, an SVG file loaded in devtools?

@Razican
Copy link
Member

Razican commented May 23, 2020

Ok, an SVG file loaded in devtools?

Yes, the implementation is based in this: https://blog.rust-lang.org/inside-rust/2020/02/25/intro-rustc-self-profile.html

@jasonwilliams jasonwilliams force-pushed the profiler branch 2 times, most recently from 7e764b5 to ef31daf Compare May 25, 2020 21:18
@jasonwilliams
Copy link
Member Author

jasonwilliams commented May 28, 2020

@Razican You asked about the realm profile image
image

You should be able to load this into Chrome
https://gist.github.com/jasonwilliams/b1caeef708b36205f279e7342fbfbcf9

@Razican
Copy link
Member

Razican commented May 29, 2020

@Razican You asked about the realm profile image
image

You should be able to load this into Chrome
https://gist.github.com/jasonwilliams/b1caeef708b36205f279e7342fbfbcf9

This looks like a clear improvement path. Could we also have internal metrics for each of the intrinsics creation?

@jasonwilliams
Copy link
Member Author

@Razican You asked about the realm profile image
image
You should be able to load this into Chrome
https://gist.github.com/jasonwilliams/b1caeef708b36205f279e7342fbfbcf9

This looks like a clear improvement path. Could we also have internal metrics for each of the intrinsics creation?

Yeah we can go as deep as we want really, its up to us what we want to trace

@Razican
Copy link
Member

Razican commented May 29, 2020

Yeah we can go as deep as we want really, its up to us what we want to trace

In this case, I would add smaller steps in things that take too long, like this one. We can then review what we could do to improve this :)

@github-actions
Copy link

Benchmark for d526ee1

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 354.7±21.95µs 361.3±22.36µs 98%
Expression (Lexer) 1702.1±104.38ns 1684.7±72.41ns 101%
Expression (Parser) 4.0±0.26µs 4.1±0.25µs 99%
Fibonacci (Execution) 2.3±0.10ms 2.3±0.10ms 100%
For loop (Execution) 369.1±15.08µs 366.2±18.63µs 101%
For loop (Lexer) 4.3±0.20µs 4.4±0.28µs 98%
For loop (Parser) 11.4±0.91µs 11.4±0.66µs 100%
Hello World (Lexer) 831.3±55.91ns 823.9±46.77ns 101%
Hello World (Parser) 1873.0±83.48ns 1904.3±108.86ns 98%
Symbols (Execution) 392.5±25.92µs 385.8±23.79µs 102%
undefined undefined 100%

@github-actions
Copy link

Benchmark for f8dfe95

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 450.8±49.16µs 434.0±41.63µs 104%
Expression (Lexer) 2.1±0.19µs 2.1±0.17µs 99%
Expression (Parser) 5.1±0.59µs 4.7±0.43µs 109.00000000000001%
Fibonacci (Execution) 2.8±0.23ms 2.7±0.23ms 101%
For loop (Execution) 458.4±40.45µs 464.7±48.73µs 99%
For loop (Lexer) 5.8±0.54µs 5.7±0.59µs 103%
For loop (Parser) 14.3±1.46µs 13.9±1.31µs 102%
Hello World (Lexer) 1088.4±121.30ns 1001.7±92.02ns 109.00000000000001%
Hello World (Parser) 2.3±0.18µs 2.3±0.25µs 101%
Symbols (Execution) 467.9±43.43µs 479.6±46.23µs 98%
undefined undefined 100%

@jasonwilliams
Copy link
Member Author

image

latest gist shows each function being created
https://gist.github.com/jasonwilliams/b20b9b89fa965e2a3fe7a58206aba257

@github-actions
Copy link

Benchmark for 4927a91

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 475.4±45.14µs 454.0±50.34µs 105%
Expression (Lexer) 2.2±0.34µs 2.1±0.19µs 106%
Expression (Parser) 4.9±0.47µs 4.9±0.49µs 99%
Fibonacci (Execution) 2.8±0.25ms 2.9±0.26ms 98%
For loop (Execution) 496.0±53.28µs 480.1±66.46µs 103%
For loop (Lexer) 5.6±0.52µs 5.7±0.55µs 98%
For loop (Parser) 14.2±1.36µs 13.7±1.22µs 104%
Hello World (Lexer) 1016.8±100.59ns 1055.5±103.26ns 96%
Hello World (Parser) 2.3±0.20µs 2.4±0.25µs 97%
Symbols (Execution) 497.6±60.06µs 495.3±47.55µs 100%
undefined undefined 100%

@github-actions
Copy link

Benchmark for ecb8770

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 479.2±10.50µs 472.1±17.10µs 102%
Expression (Lexer) 2.2±0.04µs 2.2±0.09µs 99%
Expression (Parser) 5.3±0.17µs 5.4±0.24µs 97%
Fibonacci (Execution) 2.8±0.06ms 2.8±0.13ms 100%
For loop (Execution) 498.6±12.47µs 473.0±14.58µs 105%
For loop (Lexer) 5.8±0.16µs 5.9±0.41µs 98%
For loop (Parser) 14.4±0.43µs 14.7±0.45µs 98%
Hello World (Lexer) 1052.7±28.91ns 1087.5±125.05ns 97%
Hello World (Parser) 2.4±0.07µs 2.4±0.08µs 99%
Symbols (Execution) 508.9±24.79µs 497.3±26.77µs 102%
undefined undefined 100%

@jasonwilliams jasonwilliams marked this pull request as ready for review May 30, 2020 22:18
@github-actions
Copy link

Benchmark for be8e8f1

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 440.5±10.77µs 421.0±15.21µs 105%
Expression (Lexer) 2.0±0.05µs 2.0±0.04µs 101%
Expression (Parser) 4.8±0.15µs 4.8±0.12µs 100%
Fibonacci (Execution) 2.6±0.07ms 2.5±0.07ms 103%
For loop (Execution) 455.3±11.20µs 443.7±46.96µs 103%
For loop (Lexer) 5.3±0.16µs 5.3±0.14µs 100%
For loop (Parser) 12.9±0.38µs 13.2±0.33µs 97%
Hello World (Lexer) 977.9±26.78ns 963.7±27.39ns 101%
Hello World (Parser) 2.2±0.09µs 2.2±0.08µs 99%
Symbols (Execution) 467.3±12.13µs 445.0±27.66µs 105%
undefined undefined 100%

@github-actions
Copy link

Benchmark for 5427d10

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 391.0±22.87µs 397.7±15.74µs 98%
Expression (Lexer) 1891.6±89.61ns 1885.5±77.18ns 100%
Expression (Parser) 4.5±0.20µs 4.6±0.28µs 98%
Fibonacci (Execution) 2.3±0.08ms 2.3±0.14ms 99%
For loop (Execution) 417.6±18.06µs 406.5±15.41µs 103%
For loop (Lexer) 5.0±0.24µs 4.9±0.22µs 102%
For loop (Parser) 11.8±0.56µs 12.5±0.54µs 94%
Hello World (Lexer) 896.9±47.61ns 871.1±39.77ns 103%
Hello World (Parser) 1996.9±90.69ns 2.0±0.10µs 99%
Symbols (Execution) 407.5±22.07µs 406.5±21.66µs 100%
undefined undefined 100%

@jasonwilliams
Copy link
Member Author

jasonwilliams commented May 31, 2020

60% of time is spent in set_field during execution
image

@HalidOdat
Copy link
Member

60% of time is spent in set_field during execution
image

I thought as much, I tested with removing .set_field and directly adding to the objects properties and it speed things significantly

@github-actions
Copy link

Benchmark for dd472fe

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 396.3±36.13µs 351.6±29.73µs 112.99999999999999%
Expression (Lexer) 1817.2±158.70ns 1901.7±151.64ns 95%
Expression (Parser) 5.0±0.23µs 4.8±0.34µs 104%
Fibonacci (Execution) 2.3±0.39ms 2.3±0.17ms 101%
For loop (Execution) 377.3±42.36µs 373.1±29.96µs 101%
For loop (Lexer) 5.1±0.30µs 5.5±0.63µs 92%
For loop (Parser) 12.3±0.82µs 13.3±1.41µs 92%
Hello World (Lexer) 933.4±53.65ns 950.9±45.31ns 98%
Hello World (Parser) 2.2±0.11µs 1912.4±210.86ns 112.99999999999999%
Symbols (Execution) 422.4±32.05µs 360.1±26.41µs 117%
undefined undefined 100%

@jasonwilliams
Copy link
Member Author

I thought as much, I tested with removing .set_field and directly adding to the objects properties and it speed things significantly

That's interesting, do we have a clean way of refactoring this?

@HalidOdat
Copy link
Member

I thought as much, I tested with removing .set_field and directly adding to the objects properties and it speed things significantly

That's interesting, do we have a clean way of refactoring this?

The method that I used was only for testing, it not clean at all. I just extracted properties field for the Object and called .insert().

I think I'll try to do something more permanent in #419

Copy link
Member

@Razican Razican left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very nice, I think we can use this to profile where Boa spends it's time, and it's already bringing improvements!

I will give tomorrow a look to the OnceCell usage, since I have some time. I think we can remove the unsafe code, and maybe remove the calls to drop().

@@ -71,15 +74,23 @@ pub fn forward(engine: &mut Interpreter, src: &str) -> String {
/// The str is consumed and the state of the Interpreter is changed
/// Similar to `forward`, except the current value is returned instad of the string
/// If the interpreter fails parsing an error value is returned instead (error object)
#[allow(clippy::unit_arg, clippy::drop_copy)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, why is this needed here? Is the timer Copy or the profiler?

Copy link
Member Author

@jasonwilliams jasonwilliams May 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error: calls to `std::mem::drop` with a value that implements `Copy`. Dropping a copy leaves the original intact.
  --> boa\src\lib.rs:90:5
   |
90 |     drop(main_timer);
   |     ^^^^^^^^^^^^^^^^
   |
note: the lint level is defined here
  --> boa\src\lib.rs:9:5
   |
9  |     clippy::all,
   |     ^^^^^^^^^^^
   = note: `#[deny(clippy::drop_copy)]` implied by `#[deny(clippy::all)]`
note: argument has type ()
  --> boa\src\lib.rs:90:10
   |
90 |     drop(main_timer);
   |          ^^^^^^^^^^
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#drop_copy

error: aborting due to previous error

error: could not compile `Boa`.

So it looks like () implements Copy and when we don't compile the profiler, we're just dropping () which triggers this linting warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be solved by only calling drop() with the feature active. We can add an artificial scope here and add the #[cfg] flag.

boa/src/profiler.rs Outdated Show resolved Hide resolved
Copy link
Member

@HalidOdat HalidOdat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome!

docs/profiling.md Show resolved Hide resolved
boa/src/profiler.rs Outdated Show resolved Hide resolved
@github-actions
Copy link

Benchmark for 77bfd02

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 480.5±14.79µs 454.7±6.54µs 106%
Expression (Lexer) 2.2±0.04µs 2.2±0.04µs 102%
Expression (Parser) 5.3±0.10µs 5.3±0.10µs 100%
Fibonacci (Execution) 2.8±0.05ms 2.8±0.06ms 99%
For loop (Execution) 502.4±67.42µs 473.7±6.29µs 106%
For loop (Lexer) 5.8±0.26µs 5.7±0.09µs 102%
For loop (Parser) 14.3±0.33µs 14.3±0.27µs 100%
Hello World (Lexer) 1032.2±14.73ns 1036.0±17.41ns 100%
Hello World (Parser) 2.4±0.06µs 2.4±0.05µs 99%
Symbols (Execution) 507.9±10.15µs 485.2±8.29µs 105%
undefined undefined 100%

@github-actions
Copy link

Benchmark for 36009dd

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 459.1±11.39µs 434.5±9.73µs 106%
Expression (Lexer) 2.1±0.07µs 2.1±0.13µs 98%
Expression (Parser) 5.2±0.31µs 5.1±0.09µs 102%
Fibonacci (Execution) 2.6±0.07ms 2.7±0.04ms 98%
For loop (Execution) 470.4±14.07µs 447.2±16.30µs 105%
For loop (Lexer) 5.5±0.16µs 5.5±0.12µs 100%
For loop (Parser) 13.8±0.48µs 13.7±0.24µs 101%
Hello World (Lexer) 988.1±16.96ns 996.7±11.71ns 99%
Hello World (Parser) 2.3±0.04µs 2.3±0.06µs 101%
Symbols (Execution) 488.7±17.88µs 461.9±12.47µs 106%
undefined undefined 100%

Copy link
Member

@Razican Razican left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been checking those unsafes, and the drop() implementation is not safe. I think we should have a different approach where we pass the object around.

Comment on lines +40 to +43
pub fn default() -> BoaProfiler {
let profiler = Profiler::new(Path::new("./my_trace")).unwrap();
BoaProfiler { profiler }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should implement the Default trait, and any function returning BoaProfiler in BoaProfiler should return Self, in case we change its name in the future, to make it easier.

Comment on lines +49 to +58
pub fn drop(&self) {
// In order to drop the INSTANCE we need to get ownership of it, which isn't possible on a static unless you make it a mutable static
// mutating statics is unsafe, so we need to wrap it as so.
// This is actually safe though because init and drop are only called at the beginning and end of the application
unsafe {
INSTANCE
.take()
.expect("Could not take back profiler instance");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this is not safe. If we call forward() more than once (which we sometimes do), this would be undefined behaviour.

I think the best solution is not to use a static global here. We could use a Mutex to avoid this, but I think this would reduce performance a lot. I think the best option is to conditionally have a BoaProfiler in the Lexer, Parser and Interpreter.

In the future, we could have a global Boa object with this field. In any case, I think the three of them should have a start_event() that would be no-op if the feature is not active. And all event starts should call to the Lexer, Parser or Interpreter.

@codecov
Copy link

codecov bot commented Jun 4, 2020

Codecov Report

Merging #317 into master will decrease coverage by 0.54%.
The diff coverage is 14.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #317      +/-   ##
==========================================
- Coverage   66.64%   66.09%   -0.55%     
==========================================
  Files         124      128       +4     
  Lines        8852     8992     +140     
==========================================
+ Hits         5899     5943      +44     
- Misses       2953     3049      +96     
Impacted Files Coverage Δ
boa/src/builtins/boolean/mod.rs 54.83% <0.00%> (-1.83%) ⬇️
boa/src/builtins/console/mod.rs 27.17% <ø> (-0.15%) ⬇️
boa/src/builtins/error/range.rs 33.33% <ø> (+5.33%) ⬆️
boa/src/builtins/json/mod.rs 86.95% <ø> (-1.28%) ⬇️
boa/src/builtins/math/mod.rs 76.27% <0.00%> (-0.88%) ⬇️
boa/src/builtins/object/internal_methods_trait.rs 17.91% <0.00%> (-0.28%) ⬇️
boa/src/builtins/object/mod.rs 29.84% <ø> (+2.67%) ⬆️
boa/src/builtins/regexp/mod.rs 69.83% <0.00%> (-0.40%) ⬇️
boa/src/builtins/string/mod.rs 36.79% <0.00%> (-0.11%) ⬇️
boa/src/builtins/symbol/mod.rs 95.83% <0.00%> (-4.17%) ⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87aea64...ab08c75. Read the comment docs.

@github-actions
Copy link

github-actions bot commented Jun 4, 2020

Benchmark for 11baf79

Click to view benchmark
Test PR Benchmark Master Benchmark %
Create Realm 457.0±23.17µs 439.4±17.21µs 104%
Expression (Lexer) 2.1±0.07µs 2.1±0.17µs 98%
Expression (Parser) 5.3±0.29µs 4.8±0.17µs 111.00000000000001%
Fibonacci (Execution) 2.8±0.09ms 2.7±0.09ms 103%
For loop (Execution) 502.4±21.12µs 478.3±18.38µs 105%
For loop (Lexer) 5.4±0.24µs 5.4±0.34µs 100%
For loop (Parser) 13.6±0.62µs 13.7±0.67µs 99%
Hello World (Lexer) 985.7±33.67ns 1002.2±53.37ns 98%
Hello World (Parser) 2.3±0.12µs 2.2±0.11µs 104%
Symbols (Execution) 497.9±18.52µs 480.2±22.14µs 104%
undefined undefined 100%

@jasonwilliams jasonwilliams merged commit 32b0741 into master Jun 5, 2020
@Razican Razican deleted the profiler branch June 6, 2020 14:55
@HalidOdat HalidOdat mentioned this pull request Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Performance related changes and issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore using measureme for profiling the interpreter
6 participants