Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance image_filter_layer caching to filter a cached child #17175

Merged

Conversation

flar
Copy link
Contributor

@flar flar commented Mar 16, 2020

This is a nearly complete prototype to enhance the optimization of ImageFilterLayer in a manner similar to OpacityLayer. While implementing this prototype I discovered (and fixed) a bug in the way that OpacityLayer manages the caching of its child(ren). This prototype:

  • breaks out the "single child" code from OpacityLayer into a shareable utility layer called MergedContainerLayer

  • enhances that implementation to better extract a single Layer to be used for caching that avoids the problem where every new merged layer contains a brand new unique ContainerLayer child.

  • reworks ImageFilterLayer to use the new mechanism

  • minor changes to OpacityLayer logic to use a better child for caching

  • also includes a number of debug log strings to track the work done by ImageFilterLayer and the layer cache

  • includes an INFO log string when the caching assumptions in the new MergedContainerLayer::GetCacheableChild method fail

The new caching mechanism can speed up opacity layers on cached children by 40% or so, possibly more if the child is more than just a single Picture. Similar effects are seen on the image filter layers, especially when animating the filter on a static child.

Actual test results will be compiled when I settle on a test that best represents what app developers might encounter in the wild.

@flar flar requested a review from liyuqian March 16, 2020 20:42
@auto-assign auto-assign bot requested a review from GaryQian March 16, 2020 20:42
@flar
Copy link
Contributor Author

flar commented Mar 18, 2020

This PR has moved out of the WIP stage and is proposed for merging.

Note that there is still a new FML_LOG(INFO) in GetCacheableChild() - is that appropriate to keep?

@flar flar force-pushed the optimize-image-filter-layer-caching branch from b6a1699 to 9af0a97 Compare March 18, 2020 23:53
@liyuqian liyuqian added perf: speed Performance issues related to (mostly rendering) speed severe: performance Relates to speed or footprint issues. labels Mar 19, 2020
Copy link
Contributor

@liyuqian liyuqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to add 2 tests to this:

  1. A unit test in engine to verify MergedContainerLayer, ImageFilteredLayer, and OpacityLayer's behavior regarding having a single child, and reuse the same single child for raster cache if possible.

  2. A more sensitive device lab performance test in framework to guard performance regressions. The flutter_gallery__transition test is too insensitive to this raster cache case that we missed the regression before (Transition performance regressed slightly due to OpacityLayer raster cache miss flutter#52864).

flow/layers/container_layer.cc Outdated Show resolved Hide resolved
flow/layers/container_layer.h Show resolved Hide resolved
flow/layers/image_filter_layer.cc Outdated Show resolved Hide resolved

if (!context->has_platform_view && context->raster_cache &&
SkRect::Intersects(context->cull_rect, paint_bounds())) {
SkMatrix ctm = matrix;
#ifndef SUPPORT_FRACTIONAL_TRANSLATION
ctm = RasterCache::GetIntegralTransCTM(ctm);
#endif
context->raster_cache->Prepare(context, this, ctm);
if (render_count_ >= MINIMUM_FILTER_CACHE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to move the render_count >= threshold logic into raster_cache.cc where similar threshold logic happens for picture raster cache. In the future, we might want to provide a unified mechanism to modify the threshold, or start/disable raster caching immediately if developers give a signal for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do this (for now?) mainly because this is an either/or situation - either prepare the layer itself for caching or prepare the cacheable child for the alternate strategy. Such a mechanism would have to have some thought on whether or not the prepare failed due to minimum render count or whether it failed because of some other intrinsic property (transform not cacheable, layer too small or too big, etc.)

Rather than solve that issue now, I think I'm going to stick with the manual test here in the Preroll method and we can come back around to a general minimum frame count later.

@liyuqian
Copy link
Contributor

@flar : are you still working on this?

@flar
Copy link
Contributor Author

flar commented Mar 30, 2020

yes

@flar
Copy link
Contributor Author

flar commented Apr 13, 2020

My current related task is to extract just the fix to opacity_layer caching out into a separate PR where I will add a unit test to confirm the correct caching and hopefully a benchmark that will demonstrate the value of caching the right child.

At that point I will push that PR on its own and then revisit this PR as turning that proof of concept into the general MergedContainerLayer utility mechanism and applying that mechanism to image_filter_layer.

@flar flar force-pushed the optimize-image-filter-layer-caching branch from 9af0a97 to d728eed Compare June 22, 2020 21:22
@flar
Copy link
Contributor Author

flar commented Jun 22, 2020

This PR is in its final form ready to be merged. I ran one last set of AB runs of the ImageFiltered Transform devicelab benchmark normalized against the same base engine SHA of the most recent (at the time) engine roll and got the following results (A == framework engine, B == local engine build with fix):

                   Score                     Average A (noise)  Average B (noise)  Speed-up
average_frame_build_time_millis                   1.59 (2.17%)       1.73 (1.10%)   0.92x  
worst_frame_build_time_millis                     7.16 (19.19%)      5.36 (16.36%)  1.34x  
90th_percentile_frame_build_time_millis           2.19 (2.81%)       2.31 (3.69%)   0.95x  
99th_percentile_frame_build_time_millis           3.73 (6.24%)       3.40 (7.99%)   1.10x  
average_frame_rasterizer_time_millis             18.41 (0.51%)       7.53 (3.28%)   2.45x  
worst_frame_rasterizer_time_millis               29.66 (6.49%)      17.96 (3.19%)   1.65x  
90th_percentile_frame_rasterizer_time_millis     21.55 (0.54%)       9.27 (5.69%)   2.32x  
99th_percentile_frame_rasterizer_time_millis     24.39 (2.49%)      15.09 (16.31%)  1.62x  
average_vsync_transitions_missed                  1.38 (1.14%)       1.00 (0.00%)   1.38x  
90th_percentile_vsync_transitions_missed          2.00 (0.00%)       1.00 (0.00%)   2.00x  
99th_percentile_vsync_transitions_missed          2.00 (0.00%)       1.00 (0.00%)   2.00x  

IF caching avg raster AB

IF caching 90th raster AB

IF caching 99th raster AB

IF caching worst raster AB

@flar
Copy link
Contributor Author

flar commented Jun 22, 2020

It might be good to add 2 tests to this:

  1. A unit test in engine to verify MergedContainerLayer, ImageFilteredLayer, and OpacityLayer's behavior regarding having a single child, and reuse the same single child for raster cache if possible.
  2. A more sensitive device lab performance test in framework to guard performance regressions. The flutter_gallery__transition test is too insensitive to this raster cache case that we missed the regression before (flutter/flutter#52864).

For #1, we already have such tests for OpacityLayer, I should add similar tests for ImageFilterLayer and a base test for MergedContainerLayer (which probably should have been added when the original work on OpacityLayer was done...).

For #2, we have flutter/flutter#58277

@flar
Copy link
Contributor Author

flar commented Jun 22, 2020

I added unit tests for MergedContainerLayer. Since it doesn't expose its GetChildContainer and GetCacheableChild methods publicly the tests just confirm that it produces the same output as if it were a normal ContainerLayer (I copy/pasted the Simple and Multiple tests of ContainerLayer and just instantiate a MergedContainerLayer instead).

I also added the same caching tests for ImageFilterLayer as were added for OpacityLayer when the merged container mechanism was originally developed.

@flar
Copy link
Contributor Author

flar commented Jun 23, 2020

I was able to successfully rerun the Mac iOS Engine build that failed, but it isn't showing in the status.

@flar
Copy link
Contributor Author

flar commented Jun 24, 2020

Pinging for final reviews...

flow/layers/container_layer.h Outdated Show resolved Hide resolved
flow/layers/image_filter_layer.h Show resolved Hide resolved
flow/layers/image_filter_layer.cc Outdated Show resolved Hide resolved
flow/layers/image_filter_layer.cc Show resolved Hide resolved
@flar flar force-pushed the optimize-image-filter-layer-caching branch from ed87966 to fb3263a Compare June 24, 2020 21:09
@flar
Copy link
Contributor Author

flar commented Jun 24, 2020

Note that the latest push has a nearly complete rewrite of the MergedContainerLayer comments in container_layer.h, best to reread the final version rather than look at the commit-to-commit diffs.

Copy link
Contributor

@liyuqian liyuqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a nit. Great documentations!

// from frame to frame so we try to cache the layer itself
// for maximum performance.
TryToPrepareRasterCache(context, this, matrix);
} else if ((transformed_filter_ = filter_->makeWithLocalMatrix(matrix))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it's probably very easy for someone to think transformed_filter_ = filter_->makeWithLocalMatrix(matrix) is a typo for transformed_filter_ == filter_->makeWithLocalMatrix(matrix). A clearer approach might be

if (render_count >= ...) {
  // ...
  TryToPrepareRasterCache(context, this, matrix);
  return;
}

transformed_filter_ = filter_->makeWithLocalMatrix(matrix);
if (transformed_filter_) {
  ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was going to do something similar and something inside me said "check it out, we could do it all in one line" and I didn't think about the maintenance issue.

I rewrote it similarly - I used an else instead of a return in case someone wants to later add more logic to Preroll after this section and doesn't spot the early return. An early return would be good if that block represented an action that would prevent any further logic, such as detecting an error condition, but in the case above, it is used to avoid the children caching case. Using an else clause keeps it self contained better - it's the "try the layer, or try the children as a backup" block chained into one overall statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear - I moved the assignment inside the body of the else clause and made it separate from the if statement that tested its return value:

  } else {
    transformed_filter_ = filter_->makeWithLocalMatrix(matrix);
    if (transformed_filter_) {

I think I'll add a comment about not being able to filter the children if the supplied ImageFilter can't "makeWithLocalMatrix"...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was also an implicit logic flaw fixed in the rewrite. If the filter transform combination fails, we should still increment the render count which that last version wasn't doing. The new version increments the render count regardless of the outcome of combining the filter and transform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes pushed in latest commit.

@flar flar force-pushed the optimize-image-filter-layer-caching branch from 2b6506e to 151a6d4 Compare June 26, 2020 00:22
@flar flar added the waiting for tree to go green This PR is approved and tested, but waiting for the tree to be green to land. label Jun 26, 2020
@fluttergithubbot fluttergithubbot merged commit 3caa7e7 into flutter:master Jun 26, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 26, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 26, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 26, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 26, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes perf: speed Performance issues related to (mostly rendering) speed severe: performance Relates to speed or footprint issues. waiting for tree to go green This PR is approved and tested, but waiting for the tree to be green to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants