Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppVerifier: VERIFIER_STOP during sentry_shutdown() #488

Closed
5 of 7 tasks
k0ushal opened this issue Feb 26, 2021 · 18 comments
Closed
5 of 7 tasks

AppVerifier: VERIFIER_STOP during sentry_shutdown() #488

k0ushal opened this issue Feb 26, 2021 · 18 comments

Comments

@k0ushal
Copy link

k0ushal commented Feb 26, 2021

Description
We are running the most simple test application on Windows 10 that reports an event to Sentry containing a simple text message.
We're performing tests with AppVerifier. The message is reported successfully to Sentry.
However we get a VERIFIER_STOP assertion when sentry_shutdown() is executed.
VERIFIER STOP 0000000000000202: pid 0x4B8: Freeing heap block containing an active critical section.

When does the problem happen
In the call to sentry_shutdown() when executing with AppVerifier turned on.

  • During build
  • During run-time
  • When capturing a hard crash

Environment
Sentry test server deployed within our organization

  • OS: Windows 10, 64-bit
  • Compiler: Visual Studio 2019
  • Build config: Release with Optimization (/O2)

Steps To Reproduce

1. Enable AppVerifier for test app.

2. AppVerifier settings:

  • Basics
  • Miscellaneous
    • DangerousAPIs
    • DirtyStacks

3. Run application with Windbg attached.

4. Code

void ReportErrorEvent(const std::string& message)
{
    auto event = sentry_value_new_event();
    sentry_value_set_by_key(event, "level", sentry_value_new_string("error"));
    sentry_value_set_by_key(event, "message", sentry_value_new_string(message.c_str()));
    sentry_capture_event(event);
}

int main()
{
    sentry_options_t* options = sentry_options_new();
    if (!options) {
        std::cout << "Unable to instantiate sentry options";
        return 0;
    }

    auto releaseTag = project + "_" + version;
    //sentry_options_set_debug(options, true);
    sentry_options_set_dsn(options, dsn.c_str());
    sentry_options_set_release(options, releaseTag.c_str());
    sentry_options_set_environment(options, "production");

    auto ret{ sentry_init(options) };
    if (0 != ret) {
        std::cout << "sentry_init failed";
        sentry_options_free(options);
        return 0;
    }

    std::cout << "sending event" << std::endl;
    ReportErrorEvent("issue with cleanup");

    sentry_shutdown();     //   <--  AppVerifier asserts here
    return 0;
}

Log output

[sentry] INFO using database path "C:\Users\user.name\source\repos\SentryCustomTransport\SentryCustomTransport.sentry-native"
[sentry] DEBUG starting transport
[sentry] DEBUG starting background worker thread
[sentry] DEBUG background worker thread started
sending event
[sentry] DEBUG merging scope into event
[sentry] DEBUG adding attachments to envelope
[sentry] DEBUG sending envelope
[sentry] DEBUG submitting task to background worker thread
[sentry] DEBUG executing task on worker thread
[sentry] DEBUG sending envelope
[sentry] DEBUG submitting task to background worker thread
[sentry] DEBUG shutting down transport
[sentry] DEBUG shutting down background worker thread
[sentry] DEBUG submitting task to background worker thread
[sentry] DEBUG sending request using winhttp to "https://sentry.company.net:443/api/47/envelope/":
x-sentry-auth:Sentry sentry_key=***********************, sentry_version=7, sentry_client=sentry.native/0.4.3
content-type:application/x-sentry-envelope
content-length:5566

[sentry] DEBUG received response:
HTTP/1.1 200 OK
Connection: keep-alive
Date: Fri, 26 Feb 2021 12:09:38 GMT
Content-Length: 41
Content-Type: application/json
Server: nginx
Vary: Origin
access-control-expose-headers: retry-after, x-sentry-error, x-sentry-rate-limits
Strict-Transport-Security: max-age=31536000

[sentry] DEBUG request handled in 179ms
[sentry] DEBUG executing task on worker thread
[sentry] DEBUG sending request using winhttp to "https://sentry.company.net:443/api/47/envelope/":
x-sentry-auth:Sentry sentry_key=***********************, sentry_version=7, sentry_client=sentry.native/0.4.3
content-type:application/x-sentry-envelope
content-length:308

[sentry] DEBUG received response:
HTTP/1.1 200 OK
Connection: keep-alive
Date: Fri, 26 Feb 2021 12:09:38 GMT
Content-Length: 2
Content-Type: application/json
Server: nginx
Vary: Origin
access-control-expose-headers: x-sentry-rate-limits, retry-after, x-sentry-error
Strict-Transport-Security: max-age=31536000

[sentry] DEBUG request handled in 48ms
[sentry] DEBUG executing task on worker thread
[sentry] DEBUG background worker thread shut down

AppVerifier output

=======================================
VERIFIER STOP 0000000000000202: pid 0x4E8: Freeing heap block containing an active critical section.

000001E920C1BFB0 : Critical section address. Run !cs -s <address> to get more information.
000001E919FA4390 : Critical section initialization stack trace. Run dps <address> to dump the stack trace.
000001E920C1BF90 : Heap block address.
0000000000000070 : Heap block size.

=======================================
This verifier stop is continuable.
After debugging it use `go' to continue.

=======================================

(4e8.88c): Break instruction exception - code 80000003 (first chance)
vrfcore!VerifierStopMessageEx+0x806:
00007ffe`e5b326e6 cc int 3
0:000> Kb 100

RetAddr : Args to Child : Call Site
00 00007ffedf2f51b3 : 0000000001000002 000001e91b700000 000001e91b701000 00007ffedf3e2902 : vrfcore!VerifierStopMessageEx+0x806 01 00007ffedf2f3ae4 : 0000000000000070 0000000000000000 0000000000000000 000001e91b701000 : vfbasics+0x51b3
02 00007ffedf3061f8 : 000001e920c1bf90 0000000d6334f6d9 0000000000000070 000001e91b700000 : vfbasics+0x3ae4 03 00007ff7aeddb628 : 000001e91b700000 000001e900000000 000001e920c1bf90 00007ffef41f5570 : vfbasics+0x161f8
04 00007ff7aedbc722 : 000001e91b700000 0000000000000000 000001e920ec6ff0 00007ffef41f5570 : SentryTest!_free_base+0x1c [minkernel\crts\ucrt\src\appcrt\heap\free_base.cpp @ 105] 05 00007ff7aedbb90c : 000001e920c1dfc0 0000000000000000 000001e920c31ff0 00007ffef41f5570 : SentryTest!sentry__bgworker_decref+0x92 [\vcpkg\buildtrees\sentry-native\src\y-native-0-a09cff82c7.clean\src\sentry_sync.c @ 111]
06 00007ff7aedb532d : 000000006038e41c 0000000000000000 0000a7ded781cd0c 0000000000000011 : SentryTest!sentry_transport_free+0x1c [\vcpkg\buildtrees\sentry-native\src\y-native-0-a09cff82c7.clean\src\sentry_transport.c @ 129] 07 00007ff7aedb5278 : 000001e920bb3f60 0000000000000000 00000010701749b2 0000000d6334f6d9 : SentryTest!sentry_options_free+0x7d [\vcpkg\buildtrees\sentry-native\src\y-native-0-a09cff82c7.clean\src\sentry_options.c @ 82]
08 00007ff7aedb21ea : 000001e921a37ff0 0000000d00000000 0000000000000011 000001e920bb3f60 : SentryTest!sentry_shutdown+0xd8 [\vcpkg\buildtrees\sentry-native\src\y-native-0-a09cff82c7.clean\src\sentry_core.c @ 205] 09 00007ff7aedc0074 : 000001e920b0cfd0 0000000000000000 000001e920b12ea0 0000000000000000 : SentryTest!main+0x31a [\source\repos\SentryCustomTransport\SentryCustomTransport\SentryCustomTransport.cpp @ 41]
0a (Inline Function) : ---------------- ---------------- ---------------- ---------------- : SentryTest!invoke_main+0x22 [d:\agent_work\63\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 78]
0b 00007ffef41f7034 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : SentryTest!__scrt_common_main_seh+0x10c [d:\agent\_work\63\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 0c 00007ffef433d241 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : KERNEL32!BaseThreadInitThunk+0x14
0d 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21

@Swatinem
Copy link
Member

I notice two things here:

  1. It seems you are running sentry.native/0.4.3 which is a few versions behind. Specifically 0.4.4 includes this fix here: fix: Explicitly free lock objects #416 This should actually be exactly what you report here. Can you double check with the latest version and close this issue if its solved please?
  2. You don’t need an explicit sentry_options_free(options); when sentry_init fails, as it will clean up after itself in the failure path. Essentially you would be double-free-ing in that case.

@Sofapriester
Copy link

Sofapriester commented Feb 27, 2021

Hi @k0ushal and @Swatinem ,

I re-tested this issue with version 0.4.7 and with AppVerifier flag "Leak" active, I still get this error.

VERIFIER STOP 0000000000000202: pid 0x4CC: Freeing heap block containing an active critical section.

From debugging I see, that this code is the one opening the critical section:

File: sentry-native\src\sentry_sync.c

sentry_bgworker_t *
sentry__bgworker_new(void *state, void (*free_state)(void *state))
{
    sentry_bgworker_t *bgw = SENTRY_MAKE(sentry_bgworker_t);
    if (!bgw) {
        if (free_state) {
            free_state(state);
        }
        return NULL;
    }
    memset(bgw, 0, sizeof(sentry_bgworker_t));
    sentry__mutex_init(&bgw->task_lock);        // <= This seems to be the init of the critical section that is later not cleaned
    sentry__cond_init(&bgw->submit_signal);
    sentry__cond_init(&bgw->done_signal);
    bgw->state = state;
    bgw->free_state = free_state;
    bgw->refcount = 1;
    return bgw;
}

Best regards,
Stefan

@Sofapriester
Copy link

@Swatinem I've updated my comment

@Swatinem
Copy link
Member

Swatinem commented Mar 2, 2021

The task_lock is the exact thing being freed in the linked patch:

https://github.com/getsentry/sentry-native/pull/416/files#diff-05d284250cb83958aeae2f79dc691448f3c65d2d85b5d37e28ac31f26796b496R110

Can you double check this?

@Sofapriester
Copy link

Hello @Swatinem

I've double checked. I was using the correct version 0.4.7.

In fact I am installing sentry-native via vcpkg package manager. I checked the build logs and files. vcpkg was building the version 0.4.7 and I linked it in my project correctly.

For reproduction it is important, that you specify the AppVerifier setting "Leak".

This are the steps to reproduce I would suggests:

  1. Build a little test app with @k0ushal code snippet and link sentry 0.4.7
  2. Download the Windwos Sdk installer from here
  3. Install AppVerifier with winsdksetup.exe /ceip off /features OptionId.AvrfExternal /quiet
  4. Configure AppVerifier with C:\Windows\System32\appverif.exe -enable DangerousAPIs DirtyStacks Exceptions Handles Heaps Leak Locks SRWLock Memory TLS Threadpool -for <TEST_APP.exe>
  5. Run the app with debugger attached

@Swatinem
Copy link
Member

Swatinem commented Mar 3, 2021

I was playing around with this just now, but still can’t reproduce your failure.

However, I was able to reproduce a different verified failure thats fixed here: #492

Sentry is using quite a few static mutexes, and the only non-static one is the task_lock. That being said, the abstraction is built around static mutexes which are initialized using a InitOnceExecuteOnce at lock time. (which my PR fixes, because in the error path, the lock is never locked)

@k0ushal
Copy link
Author

k0ushal commented Mar 8, 2021

Hi Swatinem,

Thank you for taking the time to investigate this issue.
You were right about being unable to repro the failure with the test app.
I've been trying for the whole last week but I've had no luck myself, and yet our product have been reporting these failures consistently.

Turns out that the verifier stop is reported only if Sentry is being used from an explicitly loaded DLL.
AppVerifier reports the error when we're trying to unload this DLL by a call to FreeLibrary(), because it finds that the lock is not released yet.

Please find attached an archive containing the following files:

  1. SentryAppVerifierTest (source code)
    a. SentryTestApp - Test application
    b. SentryDummyDll - Sentry DLL (links against sentry.lib, exports functions to Initialize- and Deinitialize- Sentry, Report messages to Sentry)
  2. AppVerifier_Settings.PNG
  3. Windbg.PNG (Sample output from Windbg when the VERIFIER_STOP is reported)

I've built the code using VS 2019 for the Release x64 build configuration.

Here are the steps I've followed to repro the issue:

  1. Enable AppVerifier settings as shown in the picture (AppVerifier_Settings.PNG)
  2. Set symbol path in Windbg for SentryTestApp and SentryDummyDll
  3. Start SentryTestApp from Windbg-x64 (File -> Open Executable)
  4. Let the app run and wait for Windbg to report the issue
  5. Enter the debugger commands as suggested by Windbg (refer to Windbg.PNG) to inspect the call stack

Please let us know if you were able to repro the issue.

sentry_appverif_repro.zip

@Swatinem
Copy link
Member

Swatinem commented Mar 9, 2021

Turns out that the verifier stop is reported only if Sentry is being used from an explicitly loaded DLL.
AppVerifier reports the error when we're trying to unload this DLL by a call to FreeLibrary(), because it finds that the lock is not released yet.

That might be the explanation. Sentry uses some static locks which are being initialized lazily. And not being properly freed because of that. I’m not yet aware how to properly manage the lifecycle of static mutexes on Windows, will have to find that out first.

@k0ushal
Copy link
Author

k0ushal commented Mar 18, 2021

Hi @Swatinem,

Do you have any progress on this ticket? Can you please share an update?

@Swatinem
Copy link
Member

Do you have any progress on this ticket? Can you please share an update?

I have looked at this a bit, but didn’t start actually solving this yet.
Getting rid of all our global static mutexes is not an easy task. Also, I think #498 is a bit related, as that is also one limitation of using static mutexes.

@andrei-mu
Copy link
Contributor

Hello @Swatinem,

one of the things we've noticed, which might impact this issue, is the fact that sentry__transport_shutdown method is not properly performing cleanup.
As you can see here,

            if (sentry__transport_shutdown(
                    options->transport, SENTRY_DEFAULT_SHUTDOWN_TIMEOUT)

the wait time is 2 seconds, which, if there's an active envelope to be sent, is not enough. We routinely see in our tests that the messages to be sent take 2.5 to 4 seconds.

What happens is that we're calling sentry_shutdown() while a request is active, after 2 seconds, the wait times out and some inconsistent/incorrect cleanup takes place, then the flow leaves sentry_shutdown(), and we're calling FreeLibrary().
After a small window, the WinHttpSendRequest finishes the execution because the thread/task is not closed due to timeout exception, the instruction pointer is restored to next line in sentry-native code, but the code is not in memory anymore because the DLL was unloaded.

As a fix, I would suggest increasing the SENTRY_DEFAULT_SHUTDOWN_TIMEOUT to a very permissive value of 1 minute or even more. This change should NOT influence the normal flow shutdown, because is based on task wait, and in case of an abnormal timeout, the issue becomes more visible and can be reported by the users.
(Btw, the docu link https://docs.sentry.io/error-reporting/configuration/?platform=native#shutdown-timeout seems to be orphaned).

@Swatinem
Copy link
Member

Swatinem commented Jul 1, 2021

Hm, this ties in with #568 as well.
There was also another customer request to have an explicit flush function.
I think to improve this, we would need:

  • the linked PR
  • a configurable shutdown timeout, as is also port of the unified API spec
  • an explicit flush function that you can call with a custom timeout

Thanks for investigating! This has helped a lot understanding where this was coming from. I was not able to reproduce the verifier failure before.

@andrei-mu
Copy link
Contributor

Should we prepare a PR, or is it something you plan to do?

@Swatinem
Copy link
Member

Swatinem commented Jul 7, 2021

I think you can try to work on a PR to make the shutdown timeout configurable, it should be easy enough. I think creating an explicit flush hook on the transport might be a bit more complicated.
That being said, we don’t intend to actively work on this in the near future ourselves, but appreciate any PR in that direction.

@andrei-mu
Copy link
Contributor

I would like to do this, but there's no "Create Branch" option available for me. Do I need to have some permissions?

@Swatinem
Copy link
Member

Not sure what you are trying to do? Usually you just fork and create a PR from there?

@Mixaill
Copy link
Contributor

Mixaill commented Aug 8, 2022

@Swatinem Looks like was fixed by #570

@Swatinem
Copy link
Member

Swatinem commented Aug 9, 2022

thats good to hear

@Swatinem Swatinem closed this as completed Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants