Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Interpreter automatic PGO #92981

Merged
merged 3 commits into from
Nov 1, 2023
Merged

[wasm] Interpreter automatic PGO #92981

merged 3 commits into from
Nov 1, 2023

Conversation

kg
Copy link
Member

@kg kg commented Oct 4, 2023

This PR adds automatic PGO for interpreter tiering. The idea is that by maintaining a list of which interpreter methods were tiered during execution, we can skip directly to generating optimized code on future runs. This will get the application into a steady high-performance state faster on future runs, while also reducing the amount of time/memory we spend on code generation (since normally we generate unoptimized code, then generate optimized code).

The infrastructure for this feature is not platform gated, but the only platform that has an implementation for loading/saving the PGO data right now is the browser, and the runtime option controlling it is only default-on for the browser.

Web application code can either use INTERNAL.interpgo_load_data and INTERNAL.interpgo_save_data to manually control the loading/saving of the data, or can use the new withInterpreterPgo(true, autoSaveDelay) builder method to turn on automatic mode. Automatic mode will attempt to load the data during startup, and will automatically save it after a delay (at which point your application should be in something approximating a steady-state and critical code has already tiered.) The current web implementation repurposes the cache storage code from pavel's memory snapshot feature to store the pgo table in the cache next to the snapshot(s).

This PR also adds basic (controlled by a #define and off by default) timing instrumentation for generate_code, because in my testing the browser's profiler was producing wildly incorrect timing data for interpreter codegen. This should make it easy to evaluate how well the feature is working since you can just look at the output.

One interesting finding: The list of method hashes generated on a first run is bigger than the list generated on future runs - I believe this is because once a full set of methods is tiered, we no longer need the tiered version of some of them because the hottest methods have gotten inlined into their callers. So on a second run the list of hashes stabilizes into a smaller list.

In a couple local test runs of browser-bench with this active and without, the time spent in generate_code was 3600ms without interpgo and 3209ms with interpgo, so the improvement over the course of a full run of an application with lots of generated code should be worthwhile. For smaller applications it's below the noise floor, I haven't been able to measure it successfully at that scale.

@ghost ghost assigned kg Oct 4, 2023
@kg kg added the arch-wasm WebAssembly architecture label Oct 4, 2023
@ghost
Copy link

ghost commented Oct 4, 2023

Tagging subscribers to this area: @BrzVlad, @kotlarmilos
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds automatic PGO for interpreter tiering. The idea is that by maintaining a list of which interpreter methods were tiered during execution, we can skip directly to generating optimized code on future runs. This will get the application into a steady high-performance state faster on future runs, while also reducing the amount of time/memory we spend on code generation (since normally we generate unoptimized code, then generate optimized code).

Application code can either use INTERNAL.interpgo_load_data and INTERNAL.interpgo_save_data to manually control the loading/saving of the data, or can use the new withInterpreterPgo(true, autoSaveDelay) builder method to turn on automatic mode. Automatic mode will attempt to load the data during startup, and will automatically save it after a delay (at which point your application should be in something approximating a steady-state and critical code has already tiered.)

Author: kg
Assignees: kg
Labels:

area-Codegen-Interpreter-mono

Milestone: -

size_t size = sizeof(uint32_t) + 16;
uint32_t *inbuf = alloca (size);
// method tokens are globally unique within a given assembly
inbuf[0] = mono_method_get_token (method);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true for generic instances.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I need to hash the generic arguments? Or is it more complex than that?

This comment was marked as duplicate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some generic instances are very hot in the BCL, like the ones used in string/array searches

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can hash the full method name for example, but maybe its good enough this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably fine this way, if any instance of the method got tiered we can probably tier all of them. This still won't generate code for other instances until they're actually used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a wrapper method, I think the token is either shared with the method that it is wrapping, or it is zero. maybe we don't mind the collisions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we share the AOT compiler's logic for uniquely identifying methods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aot compiler/runtime has a mono_aot_method_hash () which returns a good hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mono_aot_method_hash looks comprehensive but also looks really expensive, and i'm not sure how well it will avoid collisions. do you still want me to use it?

src/mono/wasm/runtime/types/index.ts Show resolved Hide resolved
src/mono/wasm/test-main.js Show resolved Hide resolved
src/mono/wasm/runtime/types/index.ts Outdated Show resolved Hide resolved
src/mono/wasm/runtime/types/index.ts Outdated Show resolved Hide resolved
src/mono/wasm/runtime/interpgo.ts Outdated Show resolved Hide resolved
src/mono/wasm/runtime/cwraps.ts Show resolved Hide resolved
src/mono/sample/wasm/browser-bench/main.js Show resolved Hide resolved
size_t size = sizeof(uint32_t) + 16;
uint32_t *inbuf = alloca (size);
// method tokens are globally unique within a given assembly
inbuf[0] = mono_method_get_token (method);

This comment was marked as duplicate.

src/mono/mono/mini/interp/interpgo.c Outdated Show resolved Hide resolved
src/mono/mono/mini/interp/interpgo.c Outdated Show resolved Hide resolved
src/mono/mono/mini/interp/interpgo.c Outdated Show resolved Hide resolved
Copy link
Member

@radical radical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WBT changes look good.

src/mono/wasm/Wasm.Build.Tests/Templates/InterpPgoTests.cs Outdated Show resolved Hide resolved
src/mono/wasm/Wasm.Build.Tests/BrowserRunner.cs Outdated Show resolved Hide resolved
src/mono/wasm/Wasm.Build.Tests/Templates/InterpPgoTests.cs Outdated Show resolved Hide resolved
src/mono/wasm/Wasm.Build.Tests/WasmTemplateTestBase.cs Outdated Show resolved Hide resolved
}

{
_testOutput.WriteLine("/// Second run");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with playwrite, should we navigate the browser away before we load the page second time ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each page is basically a separate tab as I understand it

Checkpoint: Interpgo bloom filter works

Workaround for 'reach managed cold' hanging forever

Remove logging

Checkpoint

Checkpoint

Configurable interpreter PGO

Enable interpgo for browser bench

Checkpoint: Refactor interpgo to share cache code with memory snapshots

Use cache storage for Interpreter PGO

Code cleanup + repair merge damage

Repair merge damage

Fix some bugs and troubleshoot WBT failure

Fix build

Move prototypes

I love C

Fix key collision between interpgo and memory snapshot

Fix storeCacheEntry always writing to the memory snapshot key

Checkpoint

Move more of interpgo outside of platform guards

Fix build

Make the inline murmurhash functions also static to fix linker errors

Remove unused signature argument

Cleanup

Address PR feedback

Rearrange code

Address PR feedback

Fix assert when loading empty table

Checkpoint

Update dotnet.d.ts

Move interp codegen timing to a runtime option + make it thread safe
Move interp pgo logging to a runtime option

Add a WBT test that verifies basic functionality of interpreter PGO

Add missing file

Await interp_pgo_save_data

Apply suggestions from code review

Co-authored-by: Ankit Jain <[email protected]>

Cleanup / address PR feedback

Address PR feedback

Increase browser-bench waitFor timeout

Fix syntax
@kg kg merged commit 8bce5a8 into dotnet:main Nov 1, 2023
108 checks passed
@ghost ghost locked as resolved and limited conversation to collaborators Dec 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants