Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster wasm runtime #51458

Closed
wants to merge 2 commits into from
Closed

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Apr 18, 2021

  • Compile the wasm using -O3 rather than -Oz for a faster runtime
  • Use Link Time Optimization -flto for extra optimizations
  • Fix issues in JS and lay groundwork for closure compiler's --closure 1 which can greatly reduce the size of the JS and more than make up for the increase in the wasm (see: Faster, smaller wasm runtime #51446)
File Branch Uncompressed Brotli
dotnet.wasm main 2,220 KB 764 KB
dotnet.wasm PR 2,338 KB 771 KB
Net Change +118 KB +7 KB

Optimizing Code | Link Times

Of course, for a final release build, it is usually worth linking with something like -O3 --closure 1 for full optimizations.

Trading off code size and performance

You may wish to build the less performance-sensitive source files in your project using -Os or -Oz and the remainder using -O2 (-Os and -Oz are similar to -O2, but reduce code size at the expense of performance. -Oz reduces code size more than -Os.)

Compiling with -Os or -Oz generally avoids inlining too.

Inspired by Is WebAssembly magic performance pixie dust? where they were seeing a ore than x2 perf degradation from -Os:

Another thing that the AssemblyScript folks pointed out to me is that the --optimize flag is equivalent to -O3s which aggressively optimizes for speed, but makes tradeoffs to reduce binary size. -O3 optimizes for speed and speed only. Having -O3s as a default is good in spirit — binary size matters on the web — but is it worth it? At least in this specific example the answer is no: -O3s ends up trading the laughable amount of ~30 bytes for a huge performance penalty

And its currently using -Oz which is even worse for performance than -Os

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@benaadams
Copy link
Member Author

@radekdoulik is it possible to test the performance of this with your example in #50260 (comment); I'm not sure how to get a build that can be tested in your harness, I just seemed to make it angry in the console instead

@ghost
Copy link

ghost commented Apr 18, 2021

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details
  • Compile the wasm using -O3 rather than -Oz for a faster runtime
  • Use Link Time Optimization -flto for extra optimizations
  • Fix issues in JS and lay groundwork for closure compiler's --closure 1 which can greatly reduce the size of the JS and more than make up for the increase in the wasm (see: Faster, smaller wasm runtime #51446)
File Branch Uncompressed Brotli
dotnet.wasm main 2,220 KB 764 KB
dotnet.wasm PR 2,338 KB 771 KB
Net Change +118 KB +7 KB

Optimizing Code | Link Times

Of course, for a final release build, it is usually worth linking with something like -O3 --closure 1 for full optimizations.

Trading off code size and performance

You may wish to build the less performance-sensitive source files in your project using -Os or -Oz and the remainder using -O2 (-Os and -Oz are similar to -O2, but reduce code size at the expense of performance. -Oz reduces code size more than -Os.)

Compiling with -Os or -Oz generally avoids inlining too.

Inspired by Is WebAssembly magic performance pixie dust? where they were seeing a ore than x2 perf degradation from -Os:

Another thing that the AssemblyScript folks pointed out to me is that the --optimize flag is equivalent to -O3s which aggressively optimizes for speed, but makes tradeoffs to reduce binary size. -O3 optimizes for speed and speed only. Having -O3s as a default is good in spirit — binary size matters on the web — but is it worth it? At least in this specific example the answer is no: -O3s ends up trading the laughable amount of ~30 bytes for a huge performance penalty

And its currently using -Oz which is even worse for performance than -Os

Author: benaadams
Assignees: -
Labels:

arch-wasm, area-Build-mono

Milestone: -

@@ -72,7 +72,7 @@ ifeq ($(ENABLE_METADATA_UPDATE),true)
endif

EMCC_DEBUG_FLAGS =-g -Os -s ASSERTIONS=1 -DDEBUG=1
EMCC_RELEASE_FLAGS=-Oz --llvm-opts 2
EMCC_RELEASE_FLAGS=-O3 --llvm-opts 3 -flto
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) not that it really matters, but the ordering here is different than what is in the wasm.proj file.

@eerhardt
Copy link
Member

$(EMCC) $(EMCC_FLAGS) $(1) --js-library runtime/library_mono.js --js-library runtime/binding_support.js --js-library runtime/dotnet_support.js --js-library $(SYSTEM_NATIVE_LIBDIR)/pal_random.js $(BUILDS_OBJ_DIR)/driver.o $(BUILDS_OBJ_DIR)/pinvoke.o $(BUILDS_OBJ_DIR)/corebindings.o $(2) -o $(NATIVE_BIN_DIR)/dotnet.js $(3)

Does this need to pass --externs runtime/externs.js on it?


Refers to: src/mono/wasm/Makefile:105 in 7ea1fa3. [](commit_id = 7ea1fa3, deletion_comment = False)

@benaadams
Copy link
Member Author

Does this need to pass --externs runtime/externs.js on it?

That needs to be passed via EMCC_CLOSURE_ARGS environment var; so added that in the RunWithEmSdkEnv.cs Task, doesn't seem to be a simple way to add an env var in vanilla MsBuild?

@eerhardt
Copy link
Member

so added that in the RunWithEmSdkEnv.cs Task

I don't believe that Task gets called on non-Windows builds:

<Target Name="BuildWinWasmRuntimes"
Condition="'$(OS)' == 'Windows_NT'"
AfterTargets="Build"
DependsOnTargets="BuildPInvokeTable;BundleTimezones">

vs.

<Target Name="BuildWasmRuntimes"
Condition="'$(OS)' != 'Windows_NT'"
AfterTargets="Build"
DependsOnTargets="BuildPInvokeTable;BundleTimeZones">
<Exec Command="make -C $(MonoProjectRoot)wasm all SHELL=/bin/bash BINDIR=$(ArtifactsBinDir) MONO_BIN_DIR=$(MonoArtifactsPath) OBJDIR=$(ArtifactsObjDir) NATIVE_BIN_DIR=$(NativeBinDir) CONFIG=$(Configuration) PINVOKE_TABLE=$(WasmPInvokeTablePath) ICU_LIBDIR=$(ICULibDir) ENABLE_ES6=$(WasmEnableES6) ENABLE_METADATA_UPDATE=$(WasmEnableMetadataUpdate)"
IgnoreStandardErrorWarningFormat="true" />

From my understanding the Makefile and the BuildWinWasmRuntimes target are duplicates of each other. cc @radekdoulik

@radekdoulik
Copy link
Member

@radekdoulik is it possible to test the performance of this with your example in #50260 (comment); I'm not sure how to get a build that can be tested in your harness, I just seemed to make it angry in the console instead

I am updating my measurement code, so that it becomes more automated and less hassle to use. I will try to run it on your branch and also create PR which will add new sample, similar to src/mono/samples/wasm/browser, to run simple benchmark.

@benaadams
Copy link
Member Author

I don't believe that Task gets called on non-Windows builds:

Added to the Makefile also

@radekdoulik
Copy link
Member

From my understanding the Makefile and the BuildWinWasmRuntimes target are duplicates of each other. cc @radekdoulik

Yes, indeed. I plan to unify that, hopefully soon. Opened #51553.

Copy link
Member

@radekdoulik radekdoulik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have measured the Json timings.

measurement main/interp PR branch/interp
Json, non-ASCII text serialize 8.2714ms 8.2941ms
Json, non-ASCII text deserialize 11.6825ms 11.6508ms
Json, small serialize 0.2389ms 0.2436ms
Json, small deserialize 0.3678ms 0.3731ms
Json, large serialize 67.8831ms 69.4267ms
Json, large deserialize 101.1538ms 102.4038ms

They are very close, probably just measurement errors. Not sure whether the change is too small or whether the Json [de]serialization is not affected by changes in this PR.

Any suggestion for an area where it might show larger difference?

@jeromelaban
Copy link
Contributor

jeromelaban commented Apr 20, 2021

This change might have more impact with AOT enabled (-flto does, not sure about -O3). Note that -flto also has a very large impact on build time if not -flto=thin at link time.

I'm also wondering if this may help more the GC itself, what happens if you make lots of allocations during benchmarks?

@radekdoulik
Copy link
Member

I think for impact on AOT we would need to also modify src/mono/wasm/build targets?

@radekdoulik
Copy link
Member

I have opened draft PR with the simple benchmark sample I used to measure the times #51559

@steveisok
Copy link
Member

@lewing @radical do you think this is something we can/want to take?

@terrajobst terrajobst added the community-contribution Indicates that the PR has been added by a community member label Jul 19, 2021
@marek-safar marek-safar assigned radekdoulik and unassigned lewing Nov 22, 2021
@radekdoulik
Copy link
Member

I am sorry @benaadams that this fall off the radar. We switched to -O2 meanwhile for release builds and the build targets are also finally unified for win and linux/mac.

@radical, I think the flags can be overridden with EmccLinkOptimizationFlag property, right?

@ghost ghost locked as resolved and limited conversation to collaborators Dec 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-Build-mono community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants