Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppltasks.cpp: Fix bincompat for VS 2015 #3255

Merged
merged 1 commit into from
Dec 2, 2022

Conversation

StephanTLavavej
Copy link
Member

@StephanTLavavej StephanTLavavej commented Dec 1, 2022

Scenario

This fixes internal VSO-1684985, a high-priority bug reported by 4 customers through support cases. The scenario happens when a program:

  • Was built with VS 2015
  • Uses a "Single-Threaded Apartment"
    • DirectX reportedly requires this, so it's common
    • In the provided repro, this is activated by calling CoInitialize(nullptr)
  • Calls get() on a future returned by std::async()
  • Runs on an end user's machine with the VS 2022 17.4 redist installed
    • This is why we're hearing about it now

When these stars align, the program crashes due to ppltasks machinery throwing an invalid_operation exception that slams into noexcept, so no recovery is possible.

Analysis

future::get() eventually calls into this code in ppltasks.h:

    struct _Task_impl_base
    {
// ...
        task_status _Wait()
        {
            bool _DoWait = true;

#if (_PPL_TASK_CONTEXT_CONTROL_ENABLED)
            if (_IsNonBlockingThread())
            {
                // In order to prevent Windows Runtime STA threads from blocking the UI, calling task.wait() task.get() is illegal
                // if task has not been completed.
                if (!_IsCompleted() && !_IsCanceled())
                {
                    _THROW(invalid_operation("Illegal to wait on a task in a Windows Runtime STA"));
                }

This calls _Task_impl_base::_IsNonBlockingThread(), which is separately compiled in ppltasks.cpp. Both the header and the source file have changed over time, which is how we broke bincompat without breaking consistent fresh builds.

Header change: The usage of _IsNonBlockingThread() is guarded by _PPL_TASK_CONTEXT_CONTROL_ENABLED, which (near the top of the file on lines 50-54) appears to be opt-in (or enabled by C++/CX, which these customers apparently aren't using). This was introduced by TFS changeset 1685976 "Sync Windows changes back to DevDiv (second round)." on 2018-01-12. I believe this is why this hasn't been reported before - anyone building with VS 2019/2022 (unless they opt-in or are C++/CX) doesn't have this codepath that can throw an invalid_operation exception. (I'd need to do more source archaeology to be certain, but I suspect that this change also flowed into some updates of VS 2017, leaving only VS 2015 customers affected.)

Source file change: Originally, _Task_impl_base::_IsNonBlockingThread() had a hardcoded-to-false implementation for the normal VCRedist. There was a "real" implementation for #if defined(_CRT_APP) || defined(UNDOCKED_WINDOWS_UCRT). Then in April 2022, we merged a fix that was Changelogged for VS 2022 17.3 as "Fixed a bug affecting <future>'s use of <ppltasks.h> in Desktop XAML apps." This was #2654 "Coordinated fix for capturing continuation context for tasks in Desktop apps", contributed by @Scottj1s. It updated ppltasks.cpp, removing the hardcoded-to-false implementation and always using the "real" implementation (which returns true for Single-Threaded Apartments).

Finally, VS 2022 17.4 "unlocked" the VCRedist, so it started installing updated DLLs, and customers reported the problem within days.

Thanks to @Scottj1s for reviewing my analysis.

Summary

The bincompat break was that VS 2015's ppltasks.h unconditionally calls _IsNonBlockingThread(), but ppltasks.cpp was hardcoded to return false until recently. After VS 2022 17.4 "unlocked" the VCRedist, this function returns true for Single-Threaded Apartments, causing the crash. Consistent fresh builds are unaffected because after ppltasks.h changed on 2018-01-12, its call to _IsNonBlockingThread() is preprocessed away for most clients, so they don't care what the function in the VCRedist would return.

Fix

Any fix must involve changing ppltasks.cpp in the VCRedist - changing ppltasks.h wouldn't affect VS 2015 clients. And as usual, we're constrained in how we can change the VCRedist - we can't alter or add/remove signatures, and we don't know "who" is calling into it (could be a VS 2015 client, could be a new client).

We could revert #2654 in its entirety. This would re-break the Desktop XAML <future> scenario, but it would restore the VCRedist to a fully known state (that we lived with from VS 2015 RTM through VS 2022 17.2). We could then consider taking another fix for that scenario, but more carefully this time. The downside is that any users who started taking advantage of that fix would immediately experience a regression, and we could just be trading one set of high-priority customer support cases for another. At this time, I think we can avoid a full revert. (However, it's an option if we discover more problems in the future.)

I searched, and this appears to be the only usage of _IsNonBlockingThread(), so we don't have to worry about a zillion callsites. Therefore, this PR contains a targeted revert. (It takes a slightly different form than the original code, moving the preprocessor logic within the function, but is behaviorally equivalent.) I believe this works because the future::get() scenario needs only _IsNonBlockingThread() to return false as it previously did, but it doesn't need any of the other old do-nothing codepaths. I also believe that the risk to other users is fairly low, given the mostly opt-in guard _PPL_TASK_CONTEXT_CONTROL_ENABLED. There could theoretically be impact - if they enable that mode (or are C++/CX), then _IsNonBlockingThread() returning false for them would either cause the invalid_operation exception to be skipped (which is OK, I think), or would cause _DoWait = false; to not be assigned (which is probably undesirable). However, (1) this would be returning to our behavior in VS 2022 17.2, and (2) the only other option is the full revert which would have the same behavioral changes and more. Therefore, the targeted revert is the least impactful change.

I'm adding a comment with a very brief explanation of what the preprocessor directive is doing, and what we should investigate when we can revisit this code.

Validation

All 4 of the customers have replied, confirming that a private build of the fix works for them.

Because this scenario involves mixing VS 2015 and VS 2022, I'm not adding automated test coverage. I manually verified the repro and the fix:

I prepared a clean VM of Windows 11, fully updated, and installed VS 2022 17.4 with its C++ toolset, and selected the VS 2015 C++ toolset within it. This was a convenient way to get the VS 2015 toolset, with the VS 2022 VCRedist installed, and should be equivalent to a dev machine with the original VS 2015 IDE, followed by an end user machine with a VS 2022 VCRedist (and possibly no actual installation of VS itself).

For both x64 and x86, release and debug, I verified that the provided test case works when statically linked, but crashes (due to the invalid_operation exception slamming into a noexcept destructor in the callstack) when dynamically linked. Then I verified that the dynamic cases pass if they're run in the same directory as my fixed DLLs. (For convenience I simply rebuilt the EXE in the new location, but I could have copied it - makes no difference here, I verified that the compiler is still VS 2015.) Each "Succeeded." line happened after the expected 10-second wait.

Here's a full transcript from the x64 testing. The lengthy preamble between the CoInitialize() and the async() call is to be absolutely confident about what compiler and what mode are being used. (The reported _MSC_VER will reflect the compiler that was used at build time, not the STL DLL that's loaded on the end user's machine.)

C:\Temp>type meow.cpp
#include <future>
#include <iostream>
#include <Windows.h>
using namespace std;

int main() {
    CoInitialize(nullptr);

    cout << "_MSC_VER=" << _MSC_VER;
#ifdef _M_IX86
    cout << " x86";
#elif defined(_M_X64)
    cout << " x64";
#else
    cout << " other";
#endif
#if !defined(_DLL) && !defined(_DEBUG)
    cout << " /MT (static release)\n";
#elif !defined(_DLL) && defined(_DEBUG)
    cout << " /MTd (static debug)\n";
#elif defined(_DLL) && !defined(_DEBUG)
    cout << " /MD (dynamic release)\n";
#elif defined(_DLL) && defined(_DEBUG)
    cout << " /MDd (dynamic debug)\n";
#endif

    auto x = async([] { Sleep(10000); });
    x.get();
    cout << "Succeeded.\n";
}
C:\Temp>cl
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24245 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

C:\Temp>cl /EHsc /nologo /W4 /MT meow.cpp ole32.lib && meow
meow.cpp
_MSC_VER=1900 x64 /MT (static release)
Succeeded.

C:\Temp>cl /EHsc /nologo /W4 /MTd meow.cpp ole32.lib && meow
meow.cpp
_MSC_VER=1900 x64 /MTd (static debug)
Succeeded.

C:\Temp>cl /EHsc /nologo /W4 /MD meow.cpp ole32.lib && meow
meow.cpp
_MSC_VER=1900 x64 /MD (dynamic release)
[*** CRASH (when the invalid_operation exception slams into noexcept on a destructor) ***]

C:\Temp>cl /EHsc /nologo /W4 /MDd meow.cpp ole32.lib && meow
meow.cpp
_MSC_VER=1900 x64 /MDd (dynamic debug)
[*** CRASH (when the invalid_operation exception slams into noexcept on a destructor) ***]

C:\Temp>pushd rel-17.4-dts-without-pdbs\amd64ret\bin\amd64

C:\Temp\rel-17.4-dts-without-pdbs\amd64ret\bin\amd64>dir /b *.dll
msvcp140.dll
msvcp140d.dll

C:\Temp\rel-17.4-dts-without-pdbs\amd64ret\bin\amd64>cl /EHsc /nologo /W4 /MD C:\Temp\meow.cpp ole32.lib && meow
meow.cpp
_MSC_VER=1900 x64 /MD (dynamic release)
Succeeded.

C:\Temp\rel-17.4-dts-without-pdbs\amd64ret\bin\amd64>cl /EHsc /nologo /W4 /MDd C:\Temp\meow.cpp ole32.lib && meow
meow.cpp
_MSC_VER=1900 x64 /MDd (dynamic debug)
Succeeded.

@StephanTLavavej StephanTLavavej added bug Something isn't working high priority Important! affects redist Results in changes to separately compiled bits labels Dec 1, 2022
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner December 1, 2022 23:58
Copy link
Member

@Scottj1s Scottj1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks STL!

@AraHaan
Copy link

AraHaan commented Dec 2, 2022

I will look at the changes as well. Also could I perhaps have a copy of the fixed dlls to see if I also run into this problem silently as well? I have an old program I been wanting to troubleshoot why it works with the vs2015 runtime but after a while it simply permanently hangs and it uses a modified DXUT source code (from the dxsdk) and d3dx9 extensively (as well as a lot of COM stuff). I would like to know if the crash would vanish magically with this fix.

// TRANSITION, ABI: This preprocessor directive attempts to fix VSO-1684985 (a bincompat issue affecting VS 2015 code)
// while preserving as much of GH-2654 as possible. When we can break ABI, we should:
// * Remove this preprocessor directive - it should be unnecessary after <ppltasks.h> was changed on 2018-01-12.
// * In <ppltasks.h>, reconsider whether _Task_impl_base::_Wait() should throw invalid_operation;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a great call.

@StephanTLavavej
Copy link
Member Author

@AraHaan

Also could I perhaps have a copy of the fixed dlls to see if I also run into this problem silently as well?

I can't publicly distribute unsigned binaries, sorry.

The symptom of the bug being fixed here is that the program immediately crashes when future::get() is called, when the VS 2022 17.4 VCRedist is being used. This bug wasn't present with the VS 2022 17.2 VCRedist or earlier.

// * Remove this preprocessor directive - it should be unnecessary after <ppltasks.h> was changed on 2018-01-12.
// * In <ppltasks.h>, reconsider whether _Task_impl_base::_Wait() should throw invalid_operation;
// it's questionable whether that's conforming, and if users want to block their UI threads, we should let them.
// * Investigate whether we can avoid the ppltasks dependency entirely, making all of these issues irrelevant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the way

@StephanTLavavej
Copy link
Member Author

Mirrored as internal MSVC-PR-439114.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects redist Results in changes to separately compiled bits bug Something isn't working high priority Important!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants