-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ppltasks.cpp: Fix bincompat for VS 2015 #3255
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks STL!
I will look at the changes as well. Also could I perhaps have a copy of the fixed dlls to see if I also run into this problem silently as well? I have an old program I been wanting to troubleshoot why it works with the vs2015 runtime but after a while it simply permanently hangs and it uses a modified DXUT source code (from the dxsdk) and d3dx9 extensively (as well as a lot of COM stuff). I would like to know if the crash would vanish magically with this fix. |
// TRANSITION, ABI: This preprocessor directive attempts to fix VSO-1684985 (a bincompat issue affecting VS 2015 code) | ||
// while preserving as much of GH-2654 as possible. When we can break ABI, we should: | ||
// * Remove this preprocessor directive - it should be unnecessary after <ppltasks.h> was changed on 2018-01-12. | ||
// * In <ppltasks.h>, reconsider whether _Task_impl_base::_Wait() should throw invalid_operation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a great call.
I can't publicly distribute unsigned binaries, sorry. The symptom of the bug being fixed here is that the program immediately crashes when |
// * Remove this preprocessor directive - it should be unnecessary after <ppltasks.h> was changed on 2018-01-12. | ||
// * In <ppltasks.h>, reconsider whether _Task_impl_base::_Wait() should throw invalid_operation; | ||
// it's questionable whether that's conforming, and if users want to block their UI threads, we should let them. | ||
// * Investigate whether we can avoid the ppltasks dependency entirely, making all of these issues irrelevant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the way
Mirrored as internal MSVC-PR-439114. |
This reverts commit 26a1f76.
Scenario
This fixes internal VSO-1684985, a high-priority bug reported by 4 customers through support cases. The scenario happens when a program:
CoInitialize(nullptr)
get()
on afuture
returned bystd::async()
When these stars align, the program crashes due to ppltasks machinery throwing an
invalid_operation
exception that slams intonoexcept
, so no recovery is possible.Analysis
future::get()
eventually calls into this code inppltasks.h
:This calls
_Task_impl_base::_IsNonBlockingThread()
, which is separately compiled inppltasks.cpp
. Both the header and the source file have changed over time, which is how we broke bincompat without breaking consistent fresh builds.Header change: The usage of
_IsNonBlockingThread()
is guarded by_PPL_TASK_CONTEXT_CONTROL_ENABLED
, which (near the top of the file on lines 50-54) appears to be opt-in (or enabled by C++/CX, which these customers apparently aren't using). This was introduced by TFS changeset 1685976 "Sync Windows changes back to DevDiv (second round)." on 2018-01-12. I believe this is why this hasn't been reported before - anyone building with VS 2019/2022 (unless they opt-in or are C++/CX) doesn't have this codepath that can throw aninvalid_operation
exception. (I'd need to do more source archaeology to be certain, but I suspect that this change also flowed into some updates of VS 2017, leaving only VS 2015 customers affected.)Source file change: Originally,
_Task_impl_base::_IsNonBlockingThread()
had a hardcoded-to-false
implementation for the normal VCRedist. There was a "real" implementation for#if defined(_CRT_APP) || defined(UNDOCKED_WINDOWS_UCRT)
. Then in April 2022, we merged a fix that was Changelogged for VS 2022 17.3 as "Fixed a bug affecting<future>
's use of<ppltasks.h>
in Desktop XAML apps." This was #2654 "Coordinated fix for capturing continuation context for tasks in Desktop apps", contributed by @Scottj1s. It updatedppltasks.cpp
, removing the hardcoded-to-false
implementation and always using the "real" implementation (which returnstrue
for Single-Threaded Apartments).Finally, VS 2022 17.4 "unlocked" the VCRedist, so it started installing updated DLLs, and customers reported the problem within days.
Thanks to @Scottj1s for reviewing my analysis.
Summary
The bincompat break was that VS 2015's
ppltasks.h
unconditionally calls_IsNonBlockingThread()
, butppltasks.cpp
was hardcoded to returnfalse
until recently. After VS 2022 17.4 "unlocked" the VCRedist, this function returnstrue
for Single-Threaded Apartments, causing the crash. Consistent fresh builds are unaffected because afterppltasks.h
changed on 2018-01-12, its call to_IsNonBlockingThread()
is preprocessed away for most clients, so they don't care what the function in the VCRedist would return.Fix
Any fix must involve changing
ppltasks.cpp
in the VCRedist - changingppltasks.h
wouldn't affect VS 2015 clients. And as usual, we're constrained in how we can change the VCRedist - we can't alter or add/remove signatures, and we don't know "who" is calling into it (could be a VS 2015 client, could be a new client).We could revert #2654 in its entirety. This would re-break the Desktop XAML
<future>
scenario, but it would restore the VCRedist to a fully known state (that we lived with from VS 2015 RTM through VS 2022 17.2). We could then consider taking another fix for that scenario, but more carefully this time. The downside is that any users who started taking advantage of that fix would immediately experience a regression, and we could just be trading one set of high-priority customer support cases for another. At this time, I think we can avoid a full revert. (However, it's an option if we discover more problems in the future.)I searched, and this appears to be the only usage of
_IsNonBlockingThread()
, so we don't have to worry about a zillion callsites. Therefore, this PR contains a targeted revert. (It takes a slightly different form than the original code, moving the preprocessor logic within the function, but is behaviorally equivalent.) I believe this works because thefuture::get()
scenario needs only_IsNonBlockingThread()
to returnfalse
as it previously did, but it doesn't need any of the other old do-nothing codepaths. I also believe that the risk to other users is fairly low, given the mostly opt-in guard_PPL_TASK_CONTEXT_CONTROL_ENABLED
. There could theoretically be impact - if they enable that mode (or are C++/CX), then_IsNonBlockingThread()
returningfalse
for them would either cause theinvalid_operation
exception to be skipped (which is OK, I think), or would cause_DoWait = false;
to not be assigned (which is probably undesirable). However, (1) this would be returning to our behavior in VS 2022 17.2, and (2) the only other option is the full revert which would have the same behavioral changes and more. Therefore, the targeted revert is the least impactful change.I'm adding a comment with a very brief explanation of what the preprocessor directive is doing, and what we should investigate when we can revisit this code.
Validation
All 4 of the customers have replied, confirming that a private build of the fix works for them.
Because this scenario involves mixing VS 2015 and VS 2022, I'm not adding automated test coverage. I manually verified the repro and the fix:
I prepared a clean VM of Windows 11, fully updated, and installed VS 2022 17.4 with its C++ toolset, and selected the VS 2015 C++ toolset within it. This was a convenient way to get the VS 2015 toolset, with the VS 2022 VCRedist installed, and should be equivalent to a dev machine with the original VS 2015 IDE, followed by an end user machine with a VS 2022 VCRedist (and possibly no actual installation of VS itself).
For both x64 and x86, release and debug, I verified that the provided test case works when statically linked, but crashes (due to the
invalid_operation
exception slamming into anoexcept
destructor in the callstack) when dynamically linked. Then I verified that the dynamic cases pass if they're run in the same directory as my fixed DLLs. (For convenience I simply rebuilt the EXE in the new location, but I could have copied it - makes no difference here, I verified that the compiler is still VS 2015.) Each "Succeeded." line happened after the expected 10-second wait.Here's a full transcript from the x64 testing. The lengthy preamble between the
CoInitialize()
and theasync()
call is to be absolutely confident about what compiler and what mode are being used. (The reported_MSC_VER
will reflect the compiler that was used at build time, not the STL DLL that's loaded on the end user's machine.)