-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to fix the RasterCache benchmark gone down problems #34455
Try to fix the RasterCache benchmark gone down problems #34455
Conversation
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!). If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
I use the Flutter to this version:
That engine doesn't have the RasterCache refactoring change. I added this PR change, and run the benchmark textfield_perf__timeline_summary To get the benchmark result is this:
backdrop_filter_perf__timeline_summary
We can see the memory is gone down. |
7f7fc90
to
f064fb5
Compare
f064fb5
to
4157626
Compare
4157626
to
56877a6
Compare
It looks like this patch will have some positive effect on the benchmarks, but I don't see any obvious reason why it fixes the results and the underlying cause isn't identified. From the patch it appears that we weren't limiting ourselves to 3 items per frame? I would think it would be a very simple fix, but the patch is much more complex than I thought it should be. I'm leaning towards reverting the original fix and working this fix into it. |
It looks like some of the changes are to prevent the list of cache items from getting cluttered with entries that don't need to update. That's useful, but it doesn't seem like it would account for the size of the regressions to just process the entries. If it does, then we can make the processing more efficient in a number of ways, but it can't hurt to also restrict the list of entries to only those who are needing to be cached. Another aspect is reworking the way that we determine if there are too many DLs being cached in a given frame. If we were getting that wrong then I can easily see how that would account for the amount of time taken in the 90/99th percentile cases. But, if we are getting that wrong I don't see how the old code had an error in it and the new code requires a bit of phantom bookkeeping to answer the same questions (i.e. "I think I will generate a cache entry, so I will increment the count so that others don't even get to try"). |
About this code(#31892), it means that DL will create Entry as soon as possible, in previous version Touch method will increase access_count when Entry exists, but because we speed up DL Entry creation, it may lead to successful Touch for some nodes (node exists Entry), which will make some DL reach access_thresold faster and then generate Image |
Do you have any specific suggestions for this pitch? From the benchmark data, this pitch does make the final result of the #31892 better |
The DL always created Entry objects as soon as possible. On the first frame, Generate would never return false and so all DL layers would call Prepare which would generate an Entry and start its access_count. How did 31892 have any impact on that? |
OK, so it looks like we are caching a lot of items that are never used. Here are 2 graphs that show the difference in the RasterCacheFlows for the new_galler benchmark. The green parts are pretty much the same, but the "after-refactor" graph has a lot of red bars (indicating a cache entry that exists but is used/cache-hit in the bottom 99% of all entries). Something is definitely causing extra DisplayList objects to be cached. |
I see the problem now. We used to not add an entry for a DL until it was in frame (intersecting the cull rect), but now we always add it in PrerollSetup. Touch only increments the access_count if the entry exists so, previously, an item would have to be "in frame" before it started its count but now it will start its count immediately. I think the problem is that we are overloading the concept of whether or not an entry exists. Touch needs to be reworked to ask "if I have ever been visible" with more deterministic indications and then populate should only create an entry if it is visible. "used_this_frame" and "whether or not an entry exists" should not be overloaded in so many ways. I think we need:
We always set "encountered" so that a cache entry is kept around, cache evictions will use this flag primarily. But we only set "visible" when we see it inside the cull rect. MarkSeen then would take a flag for whether it is in the cull rect and always set "encountered" and conditionally set "visible". access_count will then only increment if non-zero or if it is visible this frame. Touch isn't really needed any more. Prepare/Populate will not generate a cache entry if the item is not "visible_this_frame" so that once we hit 3 counts we delay the actual populate until it is inside the cull rect. We could also later work out a mechanism whereby these non-visible entries might be populated lazily on frames that are well within budget. These methods will also honor the "limit per frame". What you've done sort of accomplishes this, but it does so in a round-about way that uses side effects of whether an entry is present and the single flag we've created which has a name that isn't really descriptive. |
I filed a different PR with the proposed concept of explicitly tracking visibility before starting the access_count threshold countdown. See #34562 |
Closing as I believe this is now obsolete. |
Reworking the logic and fixing the DisplayList and Layer RasterCacheItem problems.