-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: synchronize and copy global profiling data structures #3018
Conversation
Performance metrics 🚀
|
Revision | Plain | With Sentry | Diff |
---|---|---|---|
ddc9b9a | 1201.71 ms | 1226.70 ms | 24.99 ms |
03d9eb5 | 1227.35 ms | 1231.24 ms | 3.90 ms |
0001a09 | 1223.90 ms | 1249.56 ms | 25.66 ms |
2cf1b84 | 6265.25 ms | 6277.98 ms | 12.73 ms |
9d56232 | 1192.09 ms | 1228.86 ms | 36.77 ms |
dc0fe58 | 1198.56 ms | 1220.98 ms | 22.42 ms |
4d68229 | 1233.50 ms | 1262.92 ms | 29.42 ms |
8f397a7 | 1230.10 ms | 1253.88 ms | 23.77 ms |
7192d9e | 1221.86 ms | 1248.28 ms | 26.42 ms |
b2f82fa | 1236.94 ms | 1262.86 ms | 25.92 ms |
App size
Revision | Plain | With Sentry | Diff |
---|---|---|---|
ddc9b9a | 20.76 KiB | 420.40 KiB | 399.65 KiB |
03d9eb5 | 20.76 KiB | 436.33 KiB | 415.57 KiB |
0001a09 | 20.76 KiB | 433.19 KiB | 412.43 KiB |
2cf1b84 | 20.76 KiB | 431.91 KiB | 411.15 KiB |
9d56232 | 20.76 KiB | 425.80 KiB | 405.04 KiB |
dc0fe58 | 20.76 KiB | 436.50 KiB | 415.74 KiB |
4d68229 | 20.76 KiB | 432.34 KiB | 411.58 KiB |
8f397a7 | 20.76 KiB | 420.55 KiB | 399.79 KiB |
7192d9e | 20.76 KiB | 431.71 KiB | 410.95 KiB |
b2f82fa | 20.76 KiB | 419.62 KiB | 398.86 KiB |
Previous results on branch: armcknight/fix/crashes
Startup times
Revision | Plain | With Sentry | Diff |
---|---|---|---|
a753c1a | 1208.27 ms | 1225.14 ms | 16.87 ms |
619be64 | 1207.74 ms | 1227.04 ms | 19.30 ms |
699940a | 1205.20 ms | 1229.72 ms | 24.52 ms |
2ff3f84 | 1250.53 ms | 1263.92 ms | 13.39 ms |
6bc2bbe | 1238.92 ms | 1262.94 ms | 24.02 ms |
1e2582a | 1233.71 ms | 1248.94 ms | 15.23 ms |
13d5b25 | 1241.92 ms | 1257.10 ms | 15.18 ms |
c8e8975 | 1236.02 ms | 1249.16 ms | 13.14 ms |
App size
Revision | Plain | With Sentry | Diff |
---|---|---|---|
a753c1a | 20.76 KiB | 437.02 KiB | 416.26 KiB |
619be64 | 20.76 KiB | 437.02 KiB | 416.26 KiB |
699940a | 20.76 KiB | 434.80 KiB | 414.04 KiB |
2ff3f84 | 20.76 KiB | 437.02 KiB | 416.26 KiB |
6bc2bbe | 20.76 KiB | 437.16 KiB | 416.40 KiB |
1e2582a | 20.76 KiB | 437.02 KiB | 416.26 KiB |
13d5b25 | 20.76 KiB | 437.02 KiB | 416.26 KiB |
c8e8975 | 20.76 KiB | 434.80 KiB | 414.04 KiB |
5c92a4e
to
3557f57
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #3018 +/- ##
=============================================
- Coverage 88.790% 88.627% -0.164%
=============================================
Files 492 492
Lines 52893 53013 +120
Branches 18962 18951 -11
=============================================
+ Hits 46964 46984 +20
- Misses 4966 5109 +143
+ Partials 963 920 -43
... and 51 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to have a look a the tests. That's my first pass.
+ (NSDictionary<NSString *, id> *)serializeForTransaction:(SentryTransaction *)transaction | ||
profileID:(SentryId *)profileID | ||
{ | ||
std::lock_guard<std::mutex> l(_gProfilerLock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m
: If I'm not mistaken, this lock will be active until we finish serializing the profiling data below. I am unsure if that could cause some slowdowns in the code. I think we could unlock after copying the profiling data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actually new logic, it just didn't diff well. It is the same behavior as is found here in the "removed" side of the diff, that came out of the refactor: https://github.com/getsentry/sentry-cocoa/pull/3018/files#diff-9d7d24f33a2554811f36d8236cdd5b9b2c7c2786b12be31e4a05a3934193441cL356
You might be on to something here, but I'm hesitant to change how this lock works in this PR. It forces exclusivity between querying whether the profiler is running, starting and stopping it (not only normal stops, but also due to elapsed timeout timer and app going to background), and querying for profiler info like is happening in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Almost LGTM.
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.3.2 to 2.3.3. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@f3feb00...29b1f65) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
this has the additional benefit that it only does work if the if branch is taken; small benefit since it's debug only but ¯\_(ツ)_/¯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side LGTM. I leave this up to Indragie, as he requested changes.
@@ -155,7 +143,30 @@ | |||
} | |||
|
|||
{ | |||
std::lock_guard<std::mutex> l(_gSamplesArrayLock); | |||
// critical section: here is where all mutable data structures are updated, which must be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stacks
is still being mutated outside of the critical section on line 154
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, for some reason I thought I wouldn't need to put the modification of stacks
or frames
here, but now I can't remember why, because we do put the modification for samples
here and it's essentially the same structure. I'll add them.
Instructions and example for changelogPlease add an entry to Example: ## Unreleased
- synchronize and copy global profiling data structures ([#3018](https://github.com/getsentry/sentry-cocoa/pull/3018)) If none of the above apply, you can opt out of this check by adding |
In #3018, there was a change to add locking where `processBacktrace` mutated data structures that could be accessed concurrently. It attempted to separate the code that did not access the shared data structures, and moved all mutating code to a section at the end of the function. However, it missed a few reads from those data structures that also needed to be synchronized (e.g. `frames.count` and `stacks.count`). Generally, separating the critical section from the rest of the logic in that function is not worth the pain because the taking the lock for the entire duration of the function is not expensive. This refactor adds a new type, `SentryProfilingState`, that centralizes all logic for reading/writing the shared data structures. This has the additional benefit that you no longer need a `processBacktrace` function that takes many parameters where the lifetime of those parameters is unclear (this will be helpful for fixing the memory leaks in a subsequent PR). --------- Co-authored-by: Sentry Github Bot <[email protected]>
📜 Description
We've had several customer reports of crashes involving an enumeration happening in json validation, which loops through a globally-modifiable data structure we should be copying before placing in the profiling payload.
Here's the representative stack trace reported by one customer:
This stack trace also appears in the report uploaded by a customer in #2993, although it was not the blamed frame/thread, but it's possible that Crashlytics just isn't analyzing the crash correctly and this is also the root cause of that crash.
💡 Motivation and Context
💚 How did you test it?
📝 Checklist
You have to check all boxes before merging:
sendDefaultPII
is enabled.- [ ] I updated the docs if needed.- [ ] Review from the native team if needed.🔮 Next steps