-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RS5 horrible performance regression in ScrollConsoleScreenBuffer
API.
#279
Comments
Filed internally as MSFT:19270439 and assigned to myself. |
Hi, On my machine every such call of ScrollConsoleScreenBuffer takes about 7 ms in legacy mode and 450 ms in default mode (65 times slower!). |
Thank you for that. It should be very helpful when I get a chance to investigate this bug to have the minimal repro already reduced. |
OK. I am looking at this today. It looks like there is some excessive heap allocation occurring down this path that is slowing down the operation. I've made one change already today that brought this down from (on my machine) ~193ms per call to ~82ms per call. I have a few more call sites to look at. Your minimal repro project is super helpful! |
Still looking at this, just thought I'd give a little more insight. ~3-7ms = console v1 code NOTE: The top number is generated with code out of the full Windows build. The Windows build provides extensive optimization including profile guiding to final binaries which is above and beyond what I can tightly loop on my development box. It's not a 100% fair comparison to the below, but it's a good reference point. ~20ms = console v2 from approximately October 2017 Now I've been working on some fixes and I've got: Somewhere in these bottom two, I'm starting to get into the territory of the "rep movs" commands being generated in the optimizer which I believe is about as fast as I can get for copying a big block of data from point A to point B. I want to set expectations here: it is very, very unlikely we'll get back to that 3-7ms number for several reasons.
The combination of these cost us a lot of performance in the RS5 release, it appears. But on further reflection on Friday and today, it looks like a more careful implementation of some of these changes can buy us back much of the performance while maintaining the benefits we desired. My goal here is to spend another day or two looking at this and trying to correct this down to the 20-60ms range (for my PC and environment). While it won't entirely pay back the regressed speed, I feel it will strike a more sensible balance between the additional powers of the new v2 console with what you expect out of the |
Thank you for this comprehensive explanation. Maybe performance could be improved further by copying less? Note that scrolling the whole buffer is still as fast as it's always been, and scrolling even one line less is suddenly much slower. So, when the scrollable block is (much) larger than the fixed one (as in our case - 9950 lines vs 50) - maybe it would make sense to copy the fixed block somewhere instead, move the large scrollable block with that fast-full-buffer algorithm above and copy the small fixed block back where it should be? (I have no idea how all that is implemented of course, only speculating) |
Yes, I will try to look at reducing the total amount copied next. While it is really bad at copying and the times are magnified, it is easier to see the differences in everything I am trying to make copying itself faster. Once I've optimized that as far as I can, I'll try to spend a bit of time stepping up a level and reducing the total amount copied. I still don't believe I'll hit 3-7ms, but we'll see. I'll make another update when I know more. |
OK. I got it down to 40-50ms on its own for copying while maintaining the ability for it to process UTF-8 variable length data and that's about as low as I can go. Now I've stepped up a level and I'm making an optimization to avoid copying.
Then I go into the backing buffer and I manipulate the pointers to the rows in the buffer to shuffle things around in lieu of copying anything, let the fill step do its business, and go on with life. On my machine, it gets the given sample code down to 0ms per call (20 calls takes 2-3ms). If you ask for the scroll rectangle to be less than the width of the screen or you start moving things in the X dimension... it's going to go back to the copy path as it has to move contents around. But I think this optimization will seriously help with what you're trying to accomplish using the Please let me know if you think differently or have any further input on this plan. |
That sounds awesome. Thank you! |
Would the same optimisation be applied to v1? |
Nope, the v1 is a generally parked codebase. Outside of security issues, we're not touching it at all. It's our last line of back-compat support. We try not to ever regress anything in the v2 console, but worst-case, the v1 will always act exactly as it always has. |
Good news! This got merged today. It should be out to Insiders in 3-4 weeks. |
Will we get this fixed on 1809? |
@tavrez Nope. |
@zadjii-msft Due to how much of a difference this makes to utilizing the Windows Console, shouldn't that be justification to submit for a backport of this code change? Or is there a dependency that exists that would take more than trivial effort to remove for backporting? |
Someone has to prove that this is costing millions to billions of dollars of lost productivity or revenue to outweigh the risks of shipping the fix to hundreds of millions of Windows machines and potentially breaking something. Our team generally finds it pretty hard to prove that against the developer audience given that they're only a small portion of the total installed market of Windows machines. Our only backport successes really come from corporations with massive addressable market (like OEMs shipping PCs) who complain that this is fouling up their manufacturing line (or something of that ilk). Otherwise, our management typically says that the risks don't outweigh the benefits. It's also costly in terms of time, effort, and testing for us to validate a modification to a released OS. We have a mindbogglingly massive amount of automated machinery dedicated to processing and validating the things that we check in while developing the current OS builds. But it's a special costly ask to spin up some to all of those activities to validate backported fixes. We do it all the time for Patch Tuesday, but in those patches, they only pass through the minimum number of fixes required to maximize the restoration of productivity/security/revenue/etc. because every additional fix adds additional complexity and additional risk. So from our little team working hard to make developers happy, we virtually never make the cut for servicing. We're sorry, but we hope you can understand. It's just the reality of the situation to say "nope" when people ask for a backport. In our team's ideal world, you would all be running the latest console bits everywhere everytime we make a change. But that's just not how it is today. |
Insiders 18275 and higher. |
The description below refers to x64 versions of OS and application.
Repro:
Install Far Manager and set "Height" in "Screen Buffer Size" to 9999 for it. Then try to run anything, e.g.
dir
in it's command line. There will be almost a second pause before and afterdir
runs.The cuplrit is this call of
ScrollConsoleScreenBuffer
function.In my particular case the
ScrollRectangle
is set to{0,0,170,9938}
,ClipRectangle
is null ptr andDestinationOrigin
is set to{0,-1}
. This single call lasts almost a second.Legacy mode does not have this problem.
Also my impression is that this API was already rather slow even before RS5, but not so much horribly slow.
The text was updated successfully, but these errors were encountered: