Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to fix the RasterCache benchmark gone down problems #34455

Conversation

JsouLiang
Copy link
Contributor

Reworking the logic and fixing the DisplayList and Layer RasterCacheItem problems.

@flutter-dashboard
Copy link

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

@JsouLiang
Copy link
Contributor Author

JsouLiang commented Jul 4, 2022

I use the Flutter to this version:

Flutter 3.1.0-0.0.pre.1442 • channel unknown • unknown source
Framework • revision caadc255b7 (7 days ago) • 2022-06-30 10:50:05 +0800
Engine • revision 870bc6062c
Tools • Dart 2.18.0 (build 2.18.0-238.0.dev) • DevTools 2.14.1

That engine doesn't have the RasterCache refactoring change.
I also change my local engine to this version:557655b7f5 which includes the RasterCache refactoring change;

I added this PR change, and run the benchmark
../../bin/cache/dart-sdk/bin/dart bin/run.dart --local-engine-src-path=/Volumes/Extreme/flutterEngine/engine/src --local-engine=android_profile -t textfield_perf__timeline_summary --ab=4

textfield_perf__timeline_summary

To get the benchmark result is this:


═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis	1.56 (3.03%)	1.71 (10.94%)	0.91x	
worst_frame_build_time_millis	3.96 (9.58%)	4.75 (6.58%)	0.83x	
90th_percentile_frame_build_time_millis	2.64 (8.82%)	2.90 (15.15%)	0.91x	
99th_percentile_frame_build_time_millis	3.96 (9.58%)	4.75 (6.58%)	0.83x	
average_frame_rasterizer_time_millis	7.10 (3.60%)	7.70 (17.47%)	0.92x	
worst_frame_rasterizer_time_millis	50.90 (9.97%)	66.69 (18.79%)	0.76x	
90th_percentile_frame_rasterizer_time_millis	12.58 (4.63%)	11.92 (13.31%)	1.06x	
99th_percentile_frame_rasterizer_time_millis	46.64 (23.01%)	66.69 (18.79%)	0.70x	
average_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90th_percentile_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
worst_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90th_percentile_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
worst_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_picture_cache_count	0.45 (18.79%)	0.41 (22.44%)	1.10x	
90th_percentile_picture_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_picture_cache_count	11.75 (18.43%)	10.50 (24.74%)	1.12x	
worst_picture_cache_count	12.00 (17.68%)	10.50 (24.74%)	1.14x	
average_picture_cache_memory	0.41 (18.79%)	0.37 (22.44%)	1.10x	
90th_percentile_picture_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_picture_cache_memory	10.62 (18.43%)	9.49 (24.74%)	1.12x	
worst_picture_cache_memory	10.85 (17.68%)	9.49 (24.74%)	1.14x	
new_gen_gc_count	2.00 (0.00%)	4.00 (0.00%)	0.50x	
old_gen_gc_count	0.00 (0.00%)	0.50 (173.21%)	0.00x	
average_vsync_transitions_missed	1.82 (8.80%)	1.90 (8.58%)	0.96x	
90th_percentile_vsync_transitions_missed	3.00 (0.00%)	4.75 (22.94%)	0.63x	
99th_percentile_vsync_transitions_missed	5.50 (9.09%)	6.75 (21.91%)	0.81x	
30hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
60hz_frame_percentage	8.93 (0.00%)	8.93 (0.00%)	1.00x	
80hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90hz_frame_percentage	91.07 (0.00%)	91.07 (0.00%)	1.00x	
120hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
illegal_refresh_rate_frame_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	

../../bin/cache/dart-sdk/bin/dart bin/run.dart --local-engine-src-path=/Volumes/Extreme/flutterEngine/engine/src --local-engine=android_profile -t raster_cache_use_memory_perf__e2e_summary --ab=4
raster_cache_use_memory_perf__e2e_summary


═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis	0.63 (0.63%)	0.65 (0.97%)	0.97x	
worst_frame_build_time_millis	1.21 (41.38%)	0.99 (13.15%)	1.22x	
90th_percentile_frame_build_time_millis	0.73 (1.43%)	0.74 (1.50%)	0.98x	
99th_percentile_frame_build_time_millis	0.85 (3.14%)	0.84 (3.30%)	1.00x	
average_frame_rasterizer_time_millis	2.73 (0.78%)	2.72 (0.67%)	1.00x	
worst_frame_rasterizer_time_millis	4.64 (21.79%)	4.64 (21.14%)	1.00x	
90th_percentile_frame_rasterizer_time_millis	3.19 (2.11%)	3.17 (0.65%)	1.01x	
99th_percentile_frame_rasterizer_time_millis	3.64 (4.49%)	3.77 (3.24%)	0.97x	
average_layer_cache_count	4.00 (0.00%)	2.00 (0.00%)	2.00x	
90th_percentile_layer_cache_count	4.00 (0.00%)	2.00 (0.00%)	2.00x	
99th_percentile_layer_cache_count	4.00 (0.00%)	2.00 (0.00%)	2.00x	
worst_layer_cache_count	4.00 (0.00%)	2.00 (0.00%)	2.00x	
average_layer_cache_memory	1.85 (0.00%)	1.23 (0.00%)	1.51x	
90th_percentile_layer_cache_memory	1.85 (0.00%)	1.23 (0.00%)	1.51x	
99th_percentile_layer_cache_memory	1.85 (0.00%)	1.23 (0.00%)	1.51x	
worst_layer_cache_memory	1.85 (0.00%)	1.23 (0.00%)	1.51x	
average_picture_cache_count	1.00 (0.00%)	1.00 (0.00%)	1.00x	
90th_percentile_picture_cache_count	1.00 (0.00%)	1.00 (0.00%)	1.00x	
99th_percentile_picture_cache_count	1.00 (0.00%)	1.00 (0.00%)	1.00x	
worst_picture_cache_count	1.00 (0.00%)	1.00 (0.00%)	1.00x	
average_picture_cache_memory	0.04 (0.00%)	0.04 (0.00%)	1.00x	
90th_percentile_picture_cache_memory	0.04 (0.00%)	0.04 (0.00%)	1.00x	
99th_percentile_picture_cache_memory	0.04 (0.00%)	0.04 (0.00%)	1.00x	
worst_picture_cache_memory	0.04 (0.00%)	0.04 (0.00%)	1.00x	
new_gen_gc_count	2.00 (0.00%)	0.00 (0.00%)	Infinityx	
old_gen_gc_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	

../../bin/cache/dart-sdk/bin/dart bin/run.dart --local-engine-src-path=/Volumes/Extreme/flutterEngine/engine/src --local-engine=android_profile -t backdrop_filter_perf__timeline_summary --ab=4 -d

backdrop_filter_perf__timeline_summary


═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis	0.58 (0.67%)	0.56 (2.06%)	1.03x	
worst_frame_build_time_millis	1.63 (41.10%)	1.59 (31.02%)	1.03x	
90th_percentile_frame_build_time_millis	0.66 (0.67%)	0.66 (0.17%)	1.00x	
99th_percentile_frame_build_time_millis	0.84 (6.96%)	0.85 (4.44%)	0.99x	
average_frame_rasterizer_time_millis	5.08 (0.77%)	5.43 (1.56%)	0.93x	
worst_frame_rasterizer_time_millis	8.31 (14.83%)	10.29 (23.82%)	0.81x	
90th_percentile_frame_rasterizer_time_millis	5.64 (0.52%)	6.29 (1.47%)	0.90x	
99th_percentile_frame_rasterizer_time_millis	6.87 (2.59%)	7.72 (2.16%)	0.89x	
average_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90th_percentile_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
worst_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90th_percentile_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
worst_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_picture_cache_count	2.00 (0.00%)	1.75 (24.74%)	1.14x	
90th_percentile_picture_cache_count	2.00 (0.00%)	1.75 (24.74%)	1.14x	
99th_percentile_picture_cache_count	2.00 (0.00%)	1.75 (24.74%)	1.14x	
worst_picture_cache_count	2.00 (0.00%)	1.75 (24.74%)	1.14x	
average_picture_cache_memory	10.24 (0.00%)	10.11 (2.22%)	1.01x	
90th_percentile_picture_cache_memory	10.24 (0.00%)	10.11 (2.22%)	1.01x	
99th_percentile_picture_cache_memory	10.24 (0.00%)	10.11 (2.22%)	1.01x	
worst_picture_cache_memory	10.24 (0.00%)	10.11 (2.22%)	1.01x	
new_gen_gc_count	6.00 (0.00%)	5.00 (20.00%)	1.20x	
old_gen_gc_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_vsync_transitions_missed	1.38 (134.53%)	0.50 (100.00%)	2.75x	
90th_percentile_vsync_transitions_missed	2.25 (148.66%)	0.50 (100.00%)	4.50x	
99th_percentile_vsync_transitions_missed	2.25 (148.66%)	0.50 (100.00%)	4.50x	
30hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
60hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
80hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90hz_frame_percentage	100.00 (0.00%)	100.00 (0.00%)	1.00x	
120hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
illegal_refresh_rate_frame_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	

We can see the memory is gone down.
cc @flar @zanderso @dnfield

@JsouLiang JsouLiang requested review from flar and dnfield July 4, 2022 09:58
@JsouLiang JsouLiang force-pushed the triage-skiaperf-flags-on-raster-cache-refactoring-change branch 5 times, most recently from 7f7fc90 to f064fb5 Compare July 5, 2022 14:21
@JsouLiang JsouLiang added the Work in progress (WIP) Not ready (yet) for review! label Jul 5, 2022
@JsouLiang JsouLiang force-pushed the triage-skiaperf-flags-on-raster-cache-refactoring-change branch from f064fb5 to 4157626 Compare July 6, 2022 17:04
@JsouLiang JsouLiang force-pushed the triage-skiaperf-flags-on-raster-cache-refactoring-change branch from 4157626 to 56877a6 Compare July 7, 2022 07:30
@JsouLiang JsouLiang removed needs tests Work in progress (WIP) Not ready (yet) for review! labels Jul 7, 2022
@flar
Copy link
Contributor

flar commented Jul 7, 2022

It looks like this patch will have some positive effect on the benchmarks, but I don't see any obvious reason why it fixes the results and the underlying cause isn't identified. From the patch it appears that we weren't limiting ourselves to 3 items per frame? I would think it would be a very simple fix, but the patch is much more complex than I thought it should be.

I'm leaning towards reverting the original fix and working this fix into it.

@flar
Copy link
Contributor

flar commented Jul 7, 2022

It looks like some of the changes are to prevent the list of cache items from getting cluttered with entries that don't need to update. That's useful, but it doesn't seem like it would account for the size of the regressions to just process the entries. If it does, then we can make the processing more efficient in a number of ways, but it can't hurt to also restrict the list of entries to only those who are needing to be cached.

Another aspect is reworking the way that we determine if there are too many DLs being cached in a given frame. If we were getting that wrong then I can easily see how that would account for the amount of time taken in the 90/99th percentile cases. But, if we are getting that wrong I don't see how the old code had an error in it and the new code requires a bit of phantom bookkeeping to answer the same questions (i.e. "I think I will generate a cache entry, so I will increment the count so that others don't even get to try").

@JsouLiang
Copy link
Contributor Author

About this code(#31892), it means that DL will create Entry as soon as possible, in previous version Touch method will increase access_count when Entry exists, but because we speed up DL Entry creation, it may lead to successful Touch for some nodes (node exists Entry), which will make some DL reach access_thresold faster and then generate Image
@flar

@JsouLiang
Copy link
Contributor Author

Do you have any specific suggestions for this pitch? From the benchmark data, this pitch does make the final result of the #31892 better

@flar
Copy link
Contributor

flar commented Jul 8, 2022

About this code(#31892), it means that DL will create Entry as soon as possible, in previous version Touch method will increase access_count when Entry exists, but because we speed up DL Entry creation, it may lead to successful Touch for some nodes (node exists Entry), which will make some DL reach access_thresold faster and then generate Image @flar

The DL always created Entry objects as soon as possible. On the first frame, Generate would never return false and so all DL layers would call Prepare which would generate an Entry and start its access_count. How did 31892 have any impact on that?

@flar
Copy link
Contributor

flar commented Jul 8, 2022

OK, so it looks like we are caching a lot of items that are never used. Here are 2 graphs that show the difference in the RasterCacheFlows for the new_galler benchmark. The green parts are pretty much the same, but the "after-refactor" graph has a lot of red bars (indicating a cache entry that exists but is used/cache-hit in the bottom 99% of all entries).

DisplayList cache flows before refactor before-refactor-DLcache-flow
DisplayList cache flows after refactor after-refactor-DLcache-flow

Something is definitely causing extra DisplayList objects to be cached.

@flar
Copy link
Contributor

flar commented Jul 8, 2022

I see the problem now.

We used to not add an entry for a DL until it was in frame (intersecting the cull rect), but now we always add it in PrerollSetup. Touch only increments the access_count if the entry exists so, previously, an item would have to be "in frame" before it started its count but now it will start its count immediately.

I think the problem is that we are overloading the concept of whether or not an entry exists. Touch needs to be reworked to ask "if I have ever been visible" with more deterministic indications and then populate should only create an entry if it is visible.

"used_this_frame" and "whether or not an entry exists" should not be overloaded in so many ways. I think we need:

  // the DL/Matrix pair is still participating in the layer tree
  // whether or not it is visible
  entry.encountered_this_frame

  // the DL/Matrix pair intersects the cull rect
  entry.visible_this_frame

  // (redundant with access_count > 0)
  // has it ever been visible in any frame
  entry.has_been_visible

We always set "encountered" so that a cache entry is kept around, cache evictions will use this flag primarily. But we only set "visible" when we see it inside the cull rect. MarkSeen then would take a flag for whether it is in the cull rect and always set "encountered" and conditionally set "visible". access_count will then only increment if non-zero or if it is visible this frame.

Touch isn't really needed any more.

Prepare/Populate will not generate a cache entry if the item is not "visible_this_frame" so that once we hit 3 counts we delay the actual populate until it is inside the cull rect. We could also later work out a mechanism whereby these non-visible entries might be populated lazily on frames that are well within budget. These methods will also honor the "limit per frame".

What you've done sort of accomplishes this, but it does so in a round-about way that uses side effects of whether an entry is present and the single flag we've created which has a name that isn't really descriptive.

@flar
Copy link
Contributor

flar commented Jul 10, 2022

I filed a different PR with the proposed concept of explicitly tracking visibility before starting the access_count threshold countdown. See #34562

@zanderso
Copy link
Member

Closing as I believe this is now obsolete.

@zanderso zanderso closed this Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants