Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linked exceptions and which is used for grouping? #59679

Closed
Tracked by #3579
Parakoos opened this issue Nov 9, 2023 · 20 comments · Fixed by getsentry/sentry-javascript#10850
Closed
Tracked by #3579

Linked exceptions and which is used for grouping? #59679

Parakoos opened this issue Nov 9, 2023 · 20 comments · Fixed by getsentry/sentry-javascript#10850
Assignees
Labels

Comments

@Parakoos
Copy link

Parakoos commented Nov 9, 2023

Environment

SaaS (https://sentry.io/)

What are you trying to accomplish?

When playing a sound, I wrap the code in a try-catch and if I get a NotAllowedError then I catch it and wrap it in my own kind of error, with a more meaningful error message: throw new SgtError('Permission to play sounds not given.', caughtError) (where the constructor will set the cause property to the given error)

In Sentry, I can see the two errors:

Related Exceptions
    SgtError: Permission to play sounds not given.
        NotAllowedError: Permission was denied

BUT! And here is my problem, all the grouping, fingerprinting and the title and error message of the Sentry issue are all based on the underlying NotAllowedError.

This becomes a problem when in a different part of the code, I try to set the Wake Lock and that operation also throws NotAllowedError and so when I wrap that in a different custom error with a different message, the sound-related issues are all mixed up with the wake-lock issues, since the underlying error is the same, and the 'parent error' is ignored.

How are you getting stuck?

I would like to be able to signal to Sentry to focus on the main error instead of the linked error. Is there a way to do this? I am already using beforeSend to set custom finger prints for these errors, but... it seems clumsy and I can't seem to find a way to fix other things like title and description in the UI.

Where in the product are you?

Unknown

Link

No response

DSN

No response

Version

No response

┆Issue is synchronized with this Jira Improvement by Unito

@getsantry
Copy link
Contributor

getsantry bot commented Nov 9, 2023

Assigning to @getsentry/support for routing ⏲️

@getsantry
Copy link
Contributor

getsantry bot commented Nov 9, 2023

Routing to @getsentry/product-owners-unknown for triage ⏲️

@getsantry
Copy link
Contributor

getsantry bot commented Nov 14, 2023

Routing to @getsentry/product-owners-issues for triage ⏲️

@lobsterkatie
Copy link
Member

lobsterkatie commented Nov 16, 2023

@Parakoos - You're right, that does seem like incorrect behavior! I'm going to mark this as a bug and add it to our backlog.

For posterity, here's what I found when I was digging into this:

  • The overall problem is that main_exception_id is getting set to the wrong exception. Here's the JSON from an event in our FE project showing this behavior. main_exception_id is at the top level.

  • main_exception_id was introduced in feat(grouping): Exception Groups #48653, and is set here. As you can see, it's related to the top-level exceptions computed here, and it's in that latter spot where the problem lies. In this case, the error we want is labeled as an exception group, and that function therefore ignores it and picks the next one down the chain.

  • That logic, and the grouping and titling behavior, gibe with what's described in the RFC here and here. What they describe makes sense in situations where the first error in the chain isn't a meaningful error in and of itself, but a way to collect errors. Though it's (relatively) new, JS has this, too, in the form of AggregateError, but that's not what we're dealing with here.

So, it seems to me that the solution is either for the SDK to tag the outermost error as something other than an exception group, or for the logic I linked above to detect cases where the root error is meaningful rather than just a collector, and handle them differently.

@AbhiPrasad - As someone who both approved the RFC and knows the SDK, which of the above do you think is a better solution? [UPDATE: I've talked myself out of the SDK option, I think - see my comment below. Still curious if you have thoughts, but I'm going to throw this on our grouping project board for now.]

UPDATE:

Here's a screenshot of the JSON I linked above, showing the problem. The error we want to group on is marked as an exception group, and therefore it's not chosen as the "main" error.

Screenshot

image

@Parakoos
Copy link
Author

Oh wow, well, I'm glad I reported it then! 😃

While you are working on this, is there some kind of workaround to manually set the main_exception_id, either by doing something in the beforeSend callback or in the way we capture and rethrow an error? Perhaps if I threw an AggregateError?

@lobsterkatie
Copy link
Member

lobsterkatie commented Nov 17, 2023

You could try wrapping your real error in an AggregateError, though it would probably lead to a certain amount of noise/unhelpful data on your issue detail page. I would say that you could use beforeSend to make the change I was suggesting the SDK make by default, but I'm guessing that might have other consequences in terms of how the error is displayed. You could certainly try it. It would mean changing the is_exception_group value.

If we did it for real in the SDK, we might have to make changes in our backend, too, to compensate. (The more I talk about this, the more I think that the SDK route probably isn't the way for us to fix this, for exactly that reason. Probably better/easier to just throw a check into our logic for setting main_exception_id.)

All of that said, really the thing is for us to fix this, rather than for you to resort to workarounds. No guarantee given the holidays, competing priorities, etc., but if this is an easy fix, I'll try to advocate for it getting pulled from the backlog.

@AbhiPrasad
Copy link
Member

I'd avoid making changes in the SDK if possible, and changing how we set main_exception_id seems reasonable enough as a solution for this. It's been a while since I looked at this at the SDK side though, so only basing this off a cursory glance.

@lobsterkatie
Copy link
Member

lobsterkatie commented Nov 17, 2023

Yup, I agree with you.

UPDATE: I've talked myself out of the SDK option, I think

If nothing else, I kind of don't feel like tracking down all of the possible side effects of setting a different value for is_exception_group, let alone writing code to compensate for them. We're going to give the changing-the-main_exception_id-logic approach a try.

Thanks for taking a look!

@lobsterkatie
Copy link
Member

lobsterkatie commented Nov 18, 2023

@johndoherty, continuing the conversation from #60194 - You say the title used to be x and now it's y. How recently did you notice a change? And would it be possible for you to send me the ids of before and after events, along with the number that appears after /issues/ in the URL, so I can take a look? You can get the event id by clicking on the little copy icon which appears when you hover over the short id on the issue details page.

@johndoherty
Copy link

Thanks for following up @lobsterkatie! This started for us when we updated the sentry js library from version 7.46.0 to 7.74.1 about a month ago.

Here are the event and issue IDs:
Before
Event: 0016bd9bbe664364b0504c9d2feeeb1e
Issue: 3971207946

After
Event: 9adb810289404117b060005406413498
Issue: 4569597030

This is not the only "after" issue as the original issue as been split into several new issues because of the grouping change.

Let me know if there's more info I can provide!

@lobsterkatie
Copy link
Member

lobsterkatie commented Nov 20, 2023

Thanks for those - I checked them out, and that did in fact make at least the proximate problem's cause clear. Between 7.46 and 7.74, the SDK changed how it reports linked errors, presumably in order to support exception groups. You can see the difference in the first screenshot below (7.46 on the left, and 7.74 on the right).

As mentioned above, our main_exception_id logic handles the newer data schema incorrectly, so this explains the change you've seen. While that's not immediately helpful, I recognize, it's good news in that it confirms that you're also suffering from the problem we've identified, and not some other unknown issue. (In the second screenshot, you can see the lack of a main_exception_id being set in the old event, and it being incorrectly set to 1 rather than 0 in the new event.)

This is definitely on our list to fix. I'll update here once I know anything more concrete than that.

@irodrigues-git irodrigues-git added the Sync: Jira apply to auto-create a Jira shadow ticket label Nov 22, 2023
@NicklasKull
Copy link

I also experienced this problem, same kind of use cases as @Parakoos. A common use case is also when we get an AxiosError which could happen for many different reasons (bad network, 404, bad request, etc) and all for different functionalities and endpoints. Therefore we also throw a custom error (with original as cause) to group them according to logic and make it tidy.

My temporary solution is to use an earlier version of the SDK until this is solved.

@mattleonowicz
Copy link

mattleonowicz commented Jan 9, 2024

I'm so glad I found this and that it is indeed a bug and not a change in approach. Using Errors with cause is such a useful mechanic, but only when the cause is informational/secondary in all the processing. I used to have is nicely grouped, now the only thing I see is one issue for underlaying AxiosError 400, the other issue for AxiosError 500 and that's all. Very bad overview, and quite dangerous when it comes to alerts for new issues (or rather lack of them)

@mattleonowicz
Copy link

Any updates on this?
I think this is pretty serious bug. Linked Errors are really powerful with the common practice of repackaging Errors to give them more context.
The temporary solution we have is to erase cause before passing the Error to Sentry and manually set up the serialized "cause error" in extra field instead. This has a huge downside of having a stacktrace which is not easy to explore (since it's without the usual sourcemap applied on top of it).

If you have any idea for a workaround that would allow us to use cause again, but force issue grouping on by the main error please share 🙏
Thank you

@the21st
Copy link

the21st commented Jan 30, 2024

+1, we are paying Sentry a heft sum each month and this bug is completely unacceptable – it makes search unusable for us for any issues with linked errors.

@brianthi
Copy link

Appreciate y'all sharing your use cases with us. We'll revisit this issue today and share an update when we have a plan for a fix or a fix itself.

@lobsterkatie
Copy link
Member

After more investigation, @AbhiPrasad and I agreed that the change actually should come in the SDK.

It's currently marking linked errors as exception groups, when that designation should be reserved for instances of AggregateError (as confirmed in our event schema here). When grouping events, we ignore the top error of an exception group because it's just a container, and since the top error in a chain of errors is getting designated incorrectly, we're therefore incorrectly ignoring it.

We'll post an update when we have news on that fix.

@andrewmclagan
Copy link

@lobsterkatie Will second how crippling this issue is, considering this is a costly paid service and this issue renders the service almost pointless. I would expect this issue to receive a very high priority, its been open for months.

@AbhiPrasad
Copy link
Member

Hey folks - we've been busy working on our new major version for the JS SDK, which has been our primary focus.

Apologies for the delay, we'll bump this up in priority

@AbhiPrasad
Copy link
Member

We've released a fix for this as part of JS SDK release https://github.com/getsentry/sentry-javascript/releases/tag/7.104.0.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
Archived in project
Archived in project
Development

Successfully merging a pull request may close this issue.