-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.NET 8 Per-Preview Performance report on WASM, Mono AOT, and Interpreter #84302
Comments
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsThis report provides an overview of the major performance improvements and regressions in Mono AOT and Interpreter during the timeframe of .NET 8 per-preview releases. [WIP] Preview 3This report presents .NET 8 Preview 3 overview of major performance improvements and regressions in Mono AOT and Interpreter. There are a number of improvements introduced in Preview 3 to individually call out. The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report. SetupAccording to the https://github.com/dotnet/perf-autofiling-issues, the following configurations are used.
More details on .NET performance benchmarking are available at https://github.com/dotnet/performance. Mono AOT compilerThe following sections presents improvements and regressions introduced in Mono AOT compiler in the Preview 3. ImprovementsThe most improved groupings of benchmark are RegressionsThis report focuses on relevant regressions that are either in progress or investigating, and they are tracked separately. Reports #77490 and #79288 track active speed and size regressions respectively. Mono InterpreterThe following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 3. ImprovementsThe most improved groupings of benchmark are Add vector horizontal sums on Arm64 #83675 improved about 20 microbenchmarks, as detailed in dotnet/perf-autofiling-issues#14531. RegressionsThis report focuses on relevant regressions that are either in progress or investigating, and they are tracked separately. Reports #77490 and #79288 track active speed and size regressions respectively. Preview 2This report presents .NET 8 Preview 2 overview of major performance improvements and regressions in Mono AOT and Interpreter. There are a number of improvements introduced in Preview 2 to individually call out. The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report. SetupAccording to the https://github.com/dotnet/perf-autofiling-issues, the following configurations are used.
More details on .NET performance benchmarking are available at https://github.com/dotnet/performance. Mono AOT compilerThe following sections presents improvements and regressions introduced in Mono AOT compiler in the Preview 2. ImprovementsHere is a list of top 20 microbenchmarks improvements in Preview 2. Full report available here.
The most improved groupings of benchmark are Arm64 SIMD operations implemented in #83094 and #82420 improved over 1000 microbenchmarks according to the dotnet/perf-autofiling-issues#13808, dotnet/perf-autofiling-issues#13807, dotnet/perf-autofiling-issues#14023, and dotnet/perf-autofiling-issues#13990. The grouping of benchmarks related to The benchmark grouping of The changes introduced in #81306 removed types deriving from All above mentioned changes are speed-related improvements of microbechmarks. There was a significant size improvement on WASM and iOS by enabling deduplication of generics. Issue #80419 contains references to changes that reduced size on disk (SOD) for about 11% and 3% respectively. RegressionsHere is a list of top 20 microbenchmarks regressions in Preview 2. Full report available here.
This report focuses on relevant regressions that are either in progress or investigating, and they are tracked separately. Reports #77490 and #79288 track active speed and size regressions respectively. Here is a list of ongoing regressions in Preview 2 snapshot with short description.
Mono InterpreterThe following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 2. ImprovementsHere is a list of top 20 microbenchmarks improvements in Preview 2. Full report available here.
The most improved groupings of benchmark are Implementation of synch block fast paths created a regression in Mono AOT compiler #81380, but led to an improvement of about 100 microbenchmarks in Mono Interpreter, as detailed in dotnet/perf-autofiling-issues#13245. Similar to a change in AOT compiler, changes introduced in #81306 removed types deriving from RegressionsHere is a list of top 20 microbenchmarks regressions in Preview 2. Full report available here.
This report focuses on relevant regressions that are either in progress or investigating, and they are tracked separately. Reports #77490 and #79288 track active speed and size regressions respectively. Here is a list of ongoing regressions in Preview 2 snapshot with short description.
Preview 1This report presents .NET 8 Preview 1 overview of major performance improvements and regressions in Mono Interpreter. SetupAccording to the https://github.com/dotnet/perf-autofiling-issues, the following configurations are used.
More details on .NET performance benchmarking are available at https://github.com/dotnet/performance. ImprovementsHere is a list of top 20 microbenchmarks improvements in Preview 1.
There are a number of improvements introduced in Preview 1 to individually call out. The following section presents only major improvements with high-level analysis. The most improved groupings of benchmark are SpanHelpers are widly used in BCL and improvements related to them could significantly improve performance. Changes in 200a90a, 7fa0d5b, and c0447bc removed mono-specific SpanHelpers, replaced branch patterns with super-instructions, and improved detection of dead bblocks. Over 300 microbenchmarks are improved as outlined in dotnet/perf-autofiling-issues#10989 and dotnet/perf-autofiling-issues#11155. Allow passing vtypes with a single scalar field to native code using the faster code path improved Intrinsic for string allocation 9a65109 contributed to dotnet/perf-autofiling-issues#10695 and dotnet/perf-autofiling-issues#10671. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this section. All above mentioned changes are speed improvements of microbechmarks. There was a significant size improvement in web assembly by #79672 that reduced size on disk (SOD) in blazor template application for ~270kb by trimming RegressionsHere is a list of top 20 microbenchmarks regressions in Preview 1.
This report focuses on relevant regressions that are either in progress or investigating, and they are tracked separately. Reports #77490 and #79288 track active speed and size regressions respectively. Here is a list of ongoing regressions in Preview 1 snapshot with short description.
|
This report provides an overview of the major performance improvements and regressions in WASM, Mono AOT, and Interpreter during the timeframe of .NET 8 per-preview releases. It focuses on relevant improvements and regressions that are either in progress or investigating, and they are tracked separately. Reports #77490 and #79288 track active speed and size regressions respectively.
Full benchmark report will be available in form similar to #79245 and https://devblogs.microsoft.com/dotnet/performance_improvements_in_net_7/ when .NET 8 is released.
Setup
According to the https://github.com/dotnet/perf-autofiling-issues, the following configurations are used.
More details on .NET performance benchmarking are available at https://github.com/dotnet/performance.
Preview 7
The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.
Mono AOT compiler
The performance regressions and improvements are analyzed separately in #89238.
Mono Interpreter
The following sections presents improvements and regressions introduced in Interpreter in the Preview 7.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 7.
Vectorization of Vector4 in #87822 improved over 100 microbenchmarks in dotnet/perf-autofiling-issues#19758 and dotnet/perf-autofiling-issues#19760.
Fix path for empty partition in Enumerable.Select in #88425 improved EmptyTakeSelectToArray microbenchmarks as reported in dotnet/perf-autofiling-issues#19761.
Improved BigInteger operators +, - and * for trivial cases in #84733 improved some of BigInteger microbenchmarks in dotnet/perf-autofiling-issues#19762.
Precomputing the CallInfo structure in #88369 improved about 200 microbenchmarks.
The BCL change #86287 and vectorization of Vector128 in #88064 improved a dozen of Equals microbenchmarks.
Regressions
Here is a list of top 20 regressed microbenchmarks in Preview 7.
Preview 6
The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.
Mono AOT WASM
The following sections presents improvements and regressions introduced in Mono AOT WASM in the Preview 6.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 6.
Regressions
Here is a list of top 20 regressed microbenchmarks in Preview 6.
Mono AOT compiler
The performance regressions and improvements are analyzed separately in #89238.
Mono Interpreter
The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 6.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 6.
Vectorization of
Vector<T> operators
in dotnet/perf-autofiling-issues#18537 improved over 200 microbenchmarks.Changes in #87219 introduced
Math.BigMul
in NextUInt64 random method and improved several microbenchmarks reported in dotnet/perf-autofiling-issues#18690.About 120 microbenchmarks were improved dotnet/perf-autofiling-issues#19027 potentialy by #87555 or other interpreter and BCL changes.
Fozen dictionary creation is improved by 72% in #87510.
Regressions
Here is a list of top 20 regressed microbenchmarks in Preview 6.
Preview 5
There are a number of improvements introduced in Preview 5 to individually call out. The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.
Mono AOT compiler
The performance regressions and improvements are analyzed separately in #89238.
Mono Interpreter
The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 5.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 5.
Vectorization of IndexOf in #85437 improved
System.Text.RegularExpressions
microbenchmarks reported in dotnet/perf-autofiling-issues#17517. Addition of Vector128 and PackedSimd in #82773 improved about 70 microbenchmarks reported in dotnet/perf-autofiling-issues#17563 and dotnet/perf-autofiling-issues#17819.Change in Plane and Quaternion improved several microbenchmarks in dotnet/perf-autofiling-issues#18043.
Change in #85528 addressed performance problems with code like
EqualityComparer<T>.Default.Equals()
which improved over 200 microbenchmarks reported in dotnet/perf-autofiling-issues#18349. Implementation offloat32 Vector128.Equals
intrnsic improvedSystem.Numerics.Tests
microbenchmarks.Regressions
Here is a list of top 20 regressed microbenchmarks in Preview 5.
Preview 4
There are a number of improvements introduced in Preview 4 to individually call out. The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.
Mono AOT compiler
The following sections presents improvements and regressions introduced in Mono AOT compiler in the Preview 4.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 4.
BCL changes in #84210 and #84210 improved
Guid.Parse
and vectorized all sets inRegex
, as reported in dotnet/perf-autofiling-issues#15183 and dotnet/perf-autofiling-issues#15177.Implementation of fast path for mini_init_method_rgctx in #84226 improved over 50 microbenchmarks reported in dotnet/perf-autofiling-issues#15717, dotnet/perf-autofiling-issues#15796, and dotnet/perf-autofiling-issues#15799.
Intrinsics
get_Count
andget_AllBitsSet
on arm64 improved around 400 microbenchmarks, as reported in dotnet/perf-autofiling-issues#15800, dotnet/perf-autofiling-issues#15718, and dotnet/perf-autofiling-issues#15797.Allow inlining methods containing constructor calls and Intrinsified additional calls to
Type:op_Equality
improved over 100 microbenchmarks reported in dotnet/perf-autofiling-issues#16371 and dotnet/perf-autofiling-issues#16509.V128 SIMD intrinsics on Arm64 across all codegen engines in #84289 improved over 400 microbenchmarks reported in dotnet/perf-autofiling-issues#16460, dotnet/perf-autofiling-issues#16621, and dotnet/perf-autofiling-issues#16660. Adding Vector128.ConvertXX and Vector128.Create as intrinsics on arm64 improved 48 microbenchmarks reported in dotnet/perf-autofiling-issues#17314 and in dotnet/perf-autofiling-issues#17315.
Make Guid.HexsToChars aggressively inlined in #85322 improved a couple of microbenchmarks.
Regressions
Here is a list of top 20 regressed microbenchmarks in Preview 4.
Mono Interpreter
The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 4.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 4.
Implementation of
IUtf8SpanFormattable
in #84469 caused both improvements and regressions as reported in dotnet/perf-autofiling-issues#15630 and dotnet/perf-autofiling-issues#15626.DateTime{Offset}
formatting improvement about 120 microbenchmarks reported in dotnet/perf-autofiling-issues#17009. PR #85288 improved about 30 microbenchmarks reported in dotnet/perf-autofiling-issues#17245. Handling of the Utf8Formatter.TryFormat and then delegating to the relevant helpers in #85277 improved about 30 microbenchmarks.Regressions
Here is a list of top 20 regressed microbenchmarks in Preview 4.
Preview 3
The following section overviews only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.
Mono AOT compiler
The following sections presents improvements and regressions introduced in Mono AOT compiler in the Preview 3.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 3.
The most improved groupings of benchmark are
System.Numerics
as outlined dotnet/perf-autofiling-issues#14023, dotnet/perf-autofiling-issues#14224, dotnet/perf-autofiling-issues#14573, and dotnet/perf-autofiling-issues#14322. The changes implemented in #82420, #83337, and #83094 introduced Arm64 SIMD operations and improved about 1000 microbenchmarks.Regressions
Here is a list of top 20 regressed microbenchmarks in Preview 3.
Mono Interpreter
The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 3.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 3.
The most improved groupings of benchmark are
System.Buffers
,System.Collections
,System.Memory
, andSystem.Text
as outlined in dotnet/perf-autofiling-issues#14324, dotnet/perf-autofiling-issues#14325, dotnet/perf-autofiling-issues#14326, dotnet/perf-autofiling-issues#14325, dotnet/perf-autofiling-issues#14355, dotnet/perf-autofiling-issues#14359, and dotnet/perf-autofiling-issues#14361. The changes implemented in #83498 and #83490 increased inlining length limit from 20 to 30 and implementedshr.un.imm
which improved over 1000 microbenchmarks.Add vector horizontal sums on Arm64 #83675 improved about 20 microbenchmarks, as detailed in dotnet/perf-autofiling-issues#14531.
Changes in #83512 caused both improvements and regressions as reported in dotnet/perf-autofiling-issues#15008 and dotnet/perf-autofiling-issues#15154.
Regressions
Here is a list of top 20 regressed microbenchmarks in Preview 3.
Preview 2
There are a number of improvements introduced in Preview 2 to individually call out. The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.
Mono AOT compiler
The following sections presents improvements and regressions introduced in Mono AOT compiler in the Preview 2.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 2. Full report available here.
The most improved groupings of benchmark are
System.Collections
,System.Decimal
,System.Int
, andSystem.Text
as outlined in dotnet/perf-autofiling-issues#12996, dotnet/perf-autofiling-issues#13006, dotnet/perf-autofiling-issues#13217, and dotnet/perf-autofiling-issues#13264. The changes implemented in #81695 intrinsifiedRuntimeHelpers.CreateSpan<T>
widely used in the BCL and replacedicall
performance path.Arm64 SIMD operations implemented in #83094 and #82420 improved over 1000 microbenchmarks according to the dotnet/perf-autofiling-issues#13808, dotnet/perf-autofiling-issues#13807, dotnet/perf-autofiling-issues#14023, and dotnet/perf-autofiling-issues#13990.
The grouping of benchmarks related to
System.Collections
have been improved by the changes made in #81902. as outlined in dotnet/perf-autofiling-issues#13220. The changes added support for v128 constants and improved performance in about 75 microbenchmarks.The benchmark grouping of
System.Text
has been improved by the addition of S.R.I Vectors in JsonReaderHelper, introduced in #81758 and outlined in dotnet/perf-autofiling-issues#12993. Furthermore, improved handling of theldtoken+ltoken+Type::op_EqualThe
optimization implemented in #81277 have significantly improved the benchmark grouping ofSystem.Text
, as detailed in dotnet/perf-autofiling-issues#12313.The changes introduced in #81306 removed types deriving from
JsonTypeInfo<T>
have had a positive impact on the benchmark groupings of bothSystem.Numerics
andSystem.Collections
, as reported in dotnet/perf-autofiling-issues#12488 and dotnet/perf-autofiling-issues#12550.All above mentioned changes are speed-related improvements of microbechmarks. There was a significant size improvement on WASM and iOS by enabling deduplication of generics. Issue #80419 contains references to changes that reduced size on disk (SOD) for about 11% and 3% respectively.
Regressions
Here is a list of top 20 microbenchmarks regressions in Preview 2. Full report available here.
Here is a list of ongoing regressions in Preview 2 snapshot with short description.
ConcurrentDictionary
performance for stringsSystem.Numerics.*
typesArray.Reverse<T>
inImmutableArray<T>.Builder.Reverse
Mono Interpreter
The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 2.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 2. Full report available here.
The most improved groupings of benchmark are
System.Collections
,System.Numerics
, andSystem.Decimal
as outlined in dotnet/perf-autofiling-issues#12504, dotnet/perf-autofiling-issues#12544, dotnet/perf-autofiling-issues#13303, dotnet/perf-autofiling-issues#13247, dotnet/perf-autofiling-issues#13752, dotnet/perf-autofiling-issues#13761, and dotnet/perf-autofiling-issues#12744. The changes implemented in #81335 which intrinsifiedSystem.Numerics.*
types, in #82093 which intrinsifiedCreateSpan
, and in #81782 which introduced common Vector128 SIMD operations widely used in the BCL improved over 1000 microbenchmarks.Implementation of synch block fast paths created a regression in Mono AOT compiler #81380, but led to an improvement of about 100 microbenchmarks in Mono Interpreter, as detailed in dotnet/perf-autofiling-issues#13245.
Similar to a change in AOT compiler, changes introduced in #81306 removed types deriving from
JsonTypeInfo<T>
improved several microbenchmarks in Mono Interpreter. Improve ConcurrentDictionary performance for strings in #81557 improved dotnet/perf-autofiling-issues#13003. Also, code refactors led to several improvements presented in dotnet/perf-autofiling-issues#12301.Regressions
Here is a list of top 20 microbenchmarks regressions in Preview 2. Full report available here.
Here is a list of ongoing regressions in Preview 2 snapshot with short description.
Vector128
operationsPreview 1
This report presents .NET 8 Preview 1 overview of major performance improvements and regressions in Mono Interpreter.
Improvements
Here is a list of top 20 microbenchmarks improvements in Preview 1.
There are a number of improvements introduced in Preview 1 to individually call out. The following section presents only major improvements with high-level analysis.
The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis.
The most improved groupings of benchmark are
System.Runtime.Vectors
,System.Runtime.Intrinsics
andSystem.Collections
as outlined here and in dotnet/perf-autofiling-issues#10468.Adding
stobj.vt.noref
version for no reference case that is twice as fast compared to thestobj.v
improved over 400 microbenchmarks as outlined in dotnet/perf-autofiling-issues#10468 and dotnet/perf-autofiling-issues#10464.SpanHelpers are widly used in BCL and improvements related to them could significantly improve performance. Changes in 200a90a, 7fa0d5b, and c0447bc removed mono-specific SpanHelpers, replaced branch patterns with super-instructions, and improved detection of dead bblocks. Over 300 microbenchmarks are improved as outlined in dotnet/perf-autofiling-issues#10989 and dotnet/perf-autofiling-issues#11155.
Change #77331 simplified
getitem.span
opcode and avoided typical use of ldloca with it, which improved over 50 microbenchmarks.Allow passing vtypes with a single scalar field to native code using the faster code path improved
System.Text
anSystem.Collections
groupings of benchmarks as outlined in dotnet/perf-autofiling-issues#10987 and dotnet/perf-autofiling-issues#10938. The assumption is that those libraries rely on ObjectHandleOnStack types.Intrinsic for string allocation
newstr
in #79392 improved various microbenchmarks as outlined in dotnet/perf-autofiling-issues#10694 and dotnet/perf-autofiling-issues#10670.9a65109 contributed to dotnet/perf-autofiling-issues#10695 and dotnet/perf-autofiling-issues#10671.
All above mentioned changes are speed improvements of microbechmarks. There was a significant size improvement in web assembly by #79672 that reduced size on disk (SOD) in blazor template application for ~270kb by trimming
S.N.Vector
class in non-SIMD cases. With deduplication of symbols in web assembly additional size savings are achieved.Regressions
Here is a list of top 20 microbenchmarks regressions in Preview 1.
Here is a list of ongoing regressions in Preview 1 snapshot with short description.
ldloca
andstfld
opcodes in the newMatrix4x4
implementationldstr; if (uncommon) throw ex (string)
The text was updated successfully, but these errors were encountered: