Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish and capture transaction/span bound to the Scope in case of a fatal crash #2306

Open
Tracked by #62
kahest opened this issue Oct 19, 2022 · 12 comments · May be fixed by #4504
Open
Tracked by #62

Finish and capture transaction/span bound to the Scope in case of a fatal crash #2306

kahest opened this issue Oct 19, 2022 · 12 comments · May be fixed by #4504
Assignees

Comments

@kahest
Copy link
Member

kahest commented Oct 19, 2022

Description

For now the transaction gets lost in case of a fatal crash. We should finish it with an appropriate status before capturing the exception, if possible.

See getsentry/team-mobile#62

@philipphofmann
Copy link
Member

philipphofmann commented Oct 20, 2022

At the moment, I can think of two ways to achieve this.

Option A

In case of a fatal crash, we could call into ObjC to finish the ongoing transactions after SentryCrash has written the crash report to disk. We already use this approach for screenshots and view hierarchy, but it wouldn't work for all types of crashes. This would be the best effort approach.

// Report is saved to disk, now we try to take screenshots
// and view hierarchies.
// Depending on the state of the crash this may not work
// because we gonna call into non async-signal safe code
// but since the app is already in a crash state we don't
// mind if this approach crashes.
if (g_saveScreenShot || g_saveViewHierarchy) {

Option B

To make this work for all types of crashes, we would need to sync the transaction from ObjC to C memory and store it alongside a crash report. This approach is similar to syncing the scope data to SentryCrash.

/**
* This class performs a fine-grained sync of the Scope to C memory, as when SentryCrash writes a
* crash report, we can't call Objective-C methods; see SentryCrash.onCrash. For every change to the
* Scope, this class serializes only the changed property to JSON and stores it in C memory. When a
* crash happens, the SentryCrashReport picks up the JSON of all properties and adds it to the crash
* report.
*
* Previously, the SDK used SentryCrash.setUserInfo, which required the serialization of the whole
* Scope on every modification of it. When having much data in the Scope this slowed down the caller
* of the scope change. Therefore, we had to move the Scope sync to a background thread. This has
* the downside of the scope not being 100% up to date when a crash happens and, of course, lots of
* CPU overhead.
*/
@interface SentryCrashScopeObserver : NSObject <SentryScopeObserver>

This approach would add some overhead, as every change to a span or transaction would trigger a sync to c memory. To keep this performant a fine-grained sync is a must, which only syncs the current change of a span or transaction to SentryCrash, instead of the whole transaction.

@brustolin
Copy link
Contributor

I believe this is an edge case that don't worth doing option B.

@github-actions
Copy link

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@aellett
Copy link

aellett commented Sep 25, 2024

Has anyone revisited this lately? Was surprised this morning when I dug into this functionality and saw that it was different between our iOS and Android (sentry-java) apps.

@philipphofmann
Copy link
Member

@aellett, we haven't revisited this lately. It's still on our backlog.

This is different because you can still call Java code when your application terminates due to a crash on Java. On Cocoa, strictly speaking, we must only call C code once the app crashes, but we sometimes have to partially bend the rules of what is acceptable. For more information check our develop docs for signal handlers. We already do that to add screenshots and view the hierarchy to crashes. Therefore, I vote for option A. We will revisit the priority for this, but I can't give you an ETA.

@aellett
Copy link

aellett commented Oct 1, 2024

@aellett, we haven't revisited this lately. It's still on our backlog.

This is different because you can still call Java code when your application terminates due to a crash on Java. On Cocoa, strictly speaking, we must only call C code once the app crashes, but we sometimes have to partially bend the rules of what is acceptable. For more information check our develop docs for signal handlers. We already do that to add screenshots and view the hierarchy to crashes. Therefore, I vote for option A. We will revisit the priority for this, but I can't give you an ETA.

Yeah, option A sounds reasonable to me. I saw mention above that it wouldn't work for some types of crashes. Can you tell me at a high level what types of crashes it wouldn't cover?

@philipphofmann
Copy link
Member

philipphofmann commented Oct 2, 2024

@aellett, can you please describe your use case for this feature a bit? Could it be that fixing #4375 solves your underlying problem? As this feature would bend the rules of signal handlers a bit, we're reluctant to add it even behind a feature flag.

I saw mention above that it wouldn't work for some types of crashes. Can you tell me at a high level what types of crashes it wouldn't cover?

I can't recall precisely which types of crashes. You aren't allowed to call the ObjC code when you're crashing and receiving a signal from your signal handler. In some cases, it might work, and in others not. When we tested this for view hierarchy and screenshots, it usually worked. The worst thing that can happen is losing the transaction, and signal handlers registered after ours won't run correctly. We ensure that we only call the sync safe code until the SDK has written the crash report to disk so we don't lose that important information.

@aellett
Copy link

aellett commented Oct 2, 2024

@aellett, can you please describe your use case for this feature a bit? Could it be that fixing #4375 solves your underlying problem? As this feature would bend the rules of signal handlers a bit, we're reluctant to add it even behind a feature flag.

What we're looking for is that we want to measure the availability (for lack of a better word) of a given transaction. How many times did the flow within the transaction fail relative to how many times it ran. For our purposes, I think we'd count a failure as either a fatal crash or a call to SentrySDK.capture(error). We can handle the second case in a couple different ways, but I don't see a way (currently) to include fatal crashes in that calculation.

Regarding #4375, I don't think it would help us to calculate this, because I think the proposed solution was to not set the transaction field for fatal crashes. If that field were set to the name of the transaction that was running during the crash, we could try to count those errors, but then I think we'd have to do some extra math to get the availability of that transaction and I don't think we could do that within a dashboard or something. Even if we did have that association, I might still be a little concerned because we sample our transactions and errors at different rates, so I'd be worried that our availability would be off.

I can't recall precisely which types of crashes. You aren't allowed to call the ObjC code when you're crashing and receiving a signal from your signal handler. In some cases, it might work, and in others not. When we tested this for view hierarchy and screenshots, it usually worked. The worst thing that can happen is losing the transaction, and signal handlers registered after ours won't run correctly. We ensure that we only call the sync safe code until the SDK has written the crash report to disk so we don't lose that important information.

Thanks for the extra detail.

@philipphofmann
Copy link
Member

Ah, now I think I get it. So you want the failure_rate in Sentry to include crashes? Or do you measure the availability of your transactions?

That's a valid use case, and it's worth the effort. I will discuss this with the team and get back to you.

You could achieve something like this by always storing the current transaction data in the scope. The SDK stores the scope with the crash report. Then, on the next app launch, you could check in before if the event is a crash and then manually recreate parts of the transaction and mark it as errored by grabbing that information from the crash event. But you also might want to remove this extra scope data for all the other events. That's quite hacky, but it could work until we get to this. I can explain this in more detail if that sounds like something you want to do.

@philipphofmann
Copy link
Member

Another idea: we could go with option B but only serialize parts of the transaction. Maybe only the root span, including some metadata, could be enough.

@aellett
Copy link

aellett commented Oct 7, 2024

Or do you measure the availability of your transactions?

This. We want to wrap a journey through a feature in a transaction, and then measure the availability of that transaction. Whenever the journey has an error (doesn't matter if it's handled or unhandled), that should count against the availability. Sort of tangential to this: as part of this effort, we're trying to figure out the best way to set the status on a root transaction when a handled error happens. I was thinking that it (the root transaction) would have a status of something other than .ok after I called SentrySDK.capture(error), but that didn't seem to be the case while I was testing it.

Maybe only the root span [...], could be enough.

I think that this would be enough for what we're trying to do. When there is a crash I mostly care that 1) I can see the error in Sentry and 2) There was a transaction with the correct name that has a status of something besides .ok. The first one is obviously already happening, so any movement towards the second one would be brilliant.

@philipphofmann
Copy link
Member

We plan to start with option A behind a feature flag. If it doesn't work correctly, we can reconsider option B.

@philipphofmann philipphofmann self-assigned this Oct 15, 2024
@philipphofmann philipphofmann linked a pull request Nov 5, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: In Progress
Development

Successfully merging a pull request may close this issue.

5 participants