Finish and capture transaction/span bound to the Scope in case of a fatal crash #2306

kahest · 2022-10-19T12:40:03Z

Description

For now the transaction gets lost in case of a fatal crash. We should finish it with an appropriate status before capturing the exception, if possible.

See getsentry/team-mobile#62

philipphofmann · 2022-10-20T07:43:45Z

At the moment, I can think of two ways to achieve this.

Option A

In case of a fatal crash, we could call into ObjC to finish the ongoing transactions after SentryCrash has written the crash report to disk. We already use this approach for screenshots and view hierarchy, but it wouldn't work for all types of crashes. This would be the best effort approach.

sentry-cocoa/Sources/SentryCrash/Recording/SentryCrashC.c

Lines 111 to 117 in 58f558d

    
           // Report is saved to disk, now we try to take screenshots 
        
           // and view hierarchies. 
        
           // Depending on the state of the crash this may not work 
        
           // because we gonna call into non async-signal safe code 
        
           // but since the app is already in a crash state we don't 
        
           // mind if this approach crashes. 
        
           if (g_saveScreenShot || g_saveViewHierarchy) {

Option B

To make this work for all types of crashes, we would need to sync the transaction from ObjC to C memory and store it alongside a crash report. This approach is similar to syncing the scope data to SentryCrash.

sentry-cocoa/Sources/Sentry/include/SentryCrashScopeObserver.h

Lines 6 to 19 in 58f558d

    
           /** 
        
            * This class performs a fine-grained sync of the Scope to C memory, as when SentryCrash writes a 
        
            * crash report, we can't call Objective-C methods; see SentryCrash.onCrash. For every change to the 
        
            * Scope, this class serializes only the changed property to JSON and stores it in C memory. When a 
        
            * crash happens, the SentryCrashReport picks up the JSON of all properties and adds it to the crash 
        
            * report. 
        
            * 
        
            * Previously, the SDK used SentryCrash.setUserInfo, which required the serialization of the whole 
        
            * Scope on every modification of it. When having much data in the Scope this slowed down the caller 
        
            * of the scope change. Therefore, we had to move the Scope sync to a background thread. This has 
        
            * the downside of the scope not being 100% up to date when a crash happens and, of course, lots of 
        
            * CPU overhead. 
        
            */ 
        
           @interface SentryCrashScopeObserver : NSObject <SentryScopeObserver>

This approach would add some overhead, as every change to a span or transaction would trigger a sync to c memory. To keep this performant a fine-grained sync is a must, which only syncs the current change of a span or transaction to SentryCrash, instead of the whole transaction.

brustolin · 2022-10-20T09:40:48Z

I believe this is an edge case that don't worth doing option B.

github-actions · 2022-11-11T00:09:35Z

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

aellett · 2024-09-25T23:21:16Z

Has anyone revisited this lately? Was surprised this morning when I dug into this functionality and saw that it was different between our iOS and Android (sentry-java) apps.

philipphofmann · 2024-09-27T07:06:08Z

@aellett, we haven't revisited this lately. It's still on our backlog.

This is different because you can still call Java code when your application terminates due to a crash on Java. On Cocoa, strictly speaking, we must only call C code once the app crashes, but we sometimes have to partially bend the rules of what is acceptable. For more information check our develop docs for signal handlers. We already do that to add screenshots and view the hierarchy to crashes. Therefore, I vote for option A. We will revisit the priority for this, but I can't give you an ETA.

aellett · 2024-10-01T17:38:01Z

@aellett, we haven't revisited this lately. It's still on our backlog.

This is different because you can still call Java code when your application terminates due to a crash on Java. On Cocoa, strictly speaking, we must only call C code once the app crashes, but we sometimes have to partially bend the rules of what is acceptable. For more information check our develop docs for signal handlers. We already do that to add screenshots and view the hierarchy to crashes. Therefore, I vote for option A. We will revisit the priority for this, but I can't give you an ETA.

Yeah, option A sounds reasonable to me. I saw mention above that it wouldn't work for some types of crashes. Can you tell me at a high level what types of crashes it wouldn't cover?

philipphofmann · 2024-10-02T13:01:18Z

@aellett, can you please describe your use case for this feature a bit? Could it be that fixing #4375 solves your underlying problem? As this feature would bend the rules of signal handlers a bit, we're reluctant to add it even behind a feature flag.

I saw mention above that it wouldn't work for some types of crashes. Can you tell me at a high level what types of crashes it wouldn't cover?

I can't recall precisely which types of crashes. You aren't allowed to call the ObjC code when you're crashing and receiving a signal from your signal handler. In some cases, it might work, and in others not. When we tested this for view hierarchy and screenshots, it usually worked. The worst thing that can happen is losing the transaction, and signal handlers registered after ours won't run correctly. We ensure that we only call the sync safe code until the SDK has written the crash report to disk so we don't lose that important information.

aellett · 2024-10-02T17:06:11Z

@aellett, can you please describe your use case for this feature a bit? Could it be that fixing #4375 solves your underlying problem? As this feature would bend the rules of signal handlers a bit, we're reluctant to add it even behind a feature flag.

What we're looking for is that we want to measure the availability (for lack of a better word) of a given transaction. How many times did the flow within the transaction fail relative to how many times it ran. For our purposes, I think we'd count a failure as either a fatal crash or a call to SentrySDK.capture(error). We can handle the second case in a couple different ways, but I don't see a way (currently) to include fatal crashes in that calculation.

Regarding #4375, I don't think it would help us to calculate this, because I think the proposed solution was to not set the transaction field for fatal crashes. If that field were set to the name of the transaction that was running during the crash, we could try to count those errors, but then I think we'd have to do some extra math to get the availability of that transaction and I don't think we could do that within a dashboard or something. Even if we did have that association, I might still be a little concerned because we sample our transactions and errors at different rates, so I'd be worried that our availability would be off.

I can't recall precisely which types of crashes. You aren't allowed to call the ObjC code when you're crashing and receiving a signal from your signal handler. In some cases, it might work, and in others not. When we tested this for view hierarchy and screenshots, it usually worked. The worst thing that can happen is losing the transaction, and signal handlers registered after ours won't run correctly. We ensure that we only call the sync safe code until the SDK has written the crash report to disk so we don't lose that important information.

Thanks for the extra detail.

philipphofmann · 2024-10-03T12:04:07Z

Ah, now I think I get it. So you want the failure_rate in Sentry to include crashes? Or do you measure the availability of your transactions?

That's a valid use case, and it's worth the effort. I will discuss this with the team and get back to you.

You could achieve something like this by always storing the current transaction data in the scope. The SDK stores the scope with the crash report. Then, on the next app launch, you could check in before if the event is a crash and then manually recreate parts of the transaction and mark it as errored by grabbing that information from the crash event. But you also might want to remove this extra scope data for all the other events. That's quite hacky, but it could work until we get to this. I can explain this in more detail if that sounds like something you want to do.

philipphofmann · 2024-10-03T12:05:54Z

Another idea: we could go with option B but only serialize parts of the transaction. Maybe only the root span, including some metadata, could be enough.

aellett · 2024-10-07T22:12:03Z

Or do you measure the availability of your transactions?

This. We want to wrap a journey through a feature in a transaction, and then measure the availability of that transaction. Whenever the journey has an error (doesn't matter if it's handled or unhandled), that should count against the availability. Sort of tangential to this: as part of this effort, we're trying to figure out the best way to set the status on a root transaction when a handled error happens. I was thinking that it (the root transaction) would have a status of something other than .ok after I called SentrySDK.capture(error), but that didn't seem to be the case while I was testing it.

Maybe only the root span [...], could be enough.

I think that this would be enough for what we're trying to do. When there is a crash I mostly care that 1) I can see the error in Sentry and 2) There was a transaction with the correct name that has a status of something besides .ok. The first one is obviously already happening, so any movement towards the second one would be brilliant.

philipphofmann · 2024-10-09T12:36:50Z

We plan to start with option A behind a feature flag. If it doesn't work correctly, we can reconsider option B.

kahest added Platform: Cocoa Effort: Large labels Oct 19, 2022

adinauer mentioned this issue Oct 20, 2022

Finish and capture transaction/span bound to the Scope in case of a fatal crash getsentry/team-mobile#62

Open

github-actions bot added the Status: Stale label Nov 11, 2022

brustolin added Status: Backlog and removed Status: Stale labels Nov 14, 2022

kahest removed Impact: Medium labels Jul 11, 2023

hubertdeng123 removed the Status: Backlog label Jul 25, 2023

getsantry bot added the Waiting for: Product Owner label Sep 25, 2024

aellett mentioned this issue Sep 26, 2024

Error is associated with incorrect transaction after crash #4375

Closed

getsantry bot removed the Waiting for: Product Owner label Sep 27, 2024

getsantry bot added the Waiting for: Product Owner label Oct 1, 2024

getsantry bot removed the Waiting for: Product Owner label Oct 2, 2024

philipphofmann added the Waiting for: Community label Oct 2, 2024

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community labels Oct 2, 2024

getsantry bot removed the Waiting for: Product Owner label Oct 3, 2024

getsantry bot added the Waiting for: Product Owner label Oct 7, 2024

philipphofmann removed the Waiting for: Product Owner label Oct 8, 2024

philipphofmann self-assigned this Oct 15, 2024

philipphofmann linked a pull request Nov 5, 2024 that will close this issue

feat: Transactions for crashes #4504

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finish and capture transaction/span bound to the Scope in case of a fatal crash #2306

Finish and capture transaction/span bound to the Scope in case of a fatal crash #2306

kahest commented Oct 19, 2022

philipphofmann commented Oct 20, 2022 •

edited

Loading

brustolin commented Oct 20, 2022

github-actions bot commented Nov 11, 2022

aellett commented Sep 25, 2024

philipphofmann commented Sep 27, 2024

aellett commented Oct 1, 2024

philipphofmann commented Oct 2, 2024 •

edited

Loading

aellett commented Oct 2, 2024

philipphofmann commented Oct 3, 2024

philipphofmann commented Oct 3, 2024

aellett commented Oct 7, 2024

philipphofmann commented Oct 9, 2024

Finish and capture transaction/span bound to the Scope in case of a fatal crash #2306

Finish and capture transaction/span bound to the Scope in case of a fatal crash #2306

Comments

kahest commented Oct 19, 2022

Description

philipphofmann commented Oct 20, 2022 • edited Loading

Option A

Option B

brustolin commented Oct 20, 2022

github-actions bot commented Nov 11, 2022

aellett commented Sep 25, 2024

philipphofmann commented Sep 27, 2024

aellett commented Oct 1, 2024

philipphofmann commented Oct 2, 2024 • edited Loading

aellett commented Oct 2, 2024

philipphofmann commented Oct 3, 2024

philipphofmann commented Oct 3, 2024

aellett commented Oct 7, 2024

philipphofmann commented Oct 9, 2024

philipphofmann commented Oct 20, 2022 •

edited

Loading

philipphofmann commented Oct 2, 2024 •

edited

Loading