-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDK] Avoid missing conditional variable update and simplify atomic bool #2553
Conversation
Addresses two issues - 1. Fix the use of a conditional variable where a wait on the variable might not be in flight when a notify is called. This is fixed by ensuring that an associated lock is aquired before calling the notify. 2. Instead of relying on a lock an a boolean, replace the use wit a single atomic boolean.
Thanks for the PR. Not a full review, but this looks good based on a first read. Please sign the EasyCLA to pass the CI checks. |
Is it possible to add a test case to reproduce the issue and verify the fix? |
Good point. But this problem is hard to reproduce. I have no idea about how to add test case by now. Maybe someone else can help? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix.
It is very unlikely a test case can be written for this. From other projects using mutex and condition variables, testing requires some serious code injection in debug, to control exactly how threads executes, to block a thread at a given point. opentelemetry-cpp does not have the tooling for this. |
LGTM and approved, waiting for @owent to confirm since he had comments earlier. |
Sorry, I'm on holiday these days. #2584 may also fixes the problems this PR try to slove, and it also solve other deadlock problems in traces and metrics. And it has conflicts with this PR. Do you think we can just use #2584 and drop this one? |
I have the last question about the deadlock problem above. Other codes looks good to me. |
I have not had time to look into the deadlock problem, but anecdotally, we haven't had any deadlocks and eliminated crashes due to OTEL in our deployments. I believe this would be beneficial to all, even if short-lived and updated by #2584 . |
@owent Just checking if there is any update on this. Thanks. |
@marcalff could you please check the lock problem again when you have time, or could #2584 be merged now? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2553 +/- ##
==========================================
+ Coverage 87.12% 87.57% +0.46%
==========================================
Files 200 188 -12
Lines 6109 5848 -261
==========================================
- Hits 5322 5121 -201
+ Misses 787 727 -60
|
I looked again in details about the last remaining concern from @owent about a possible deadlock. After code analysis, I don't think the deadlock is possible, as the code uses a well known pattern of locking a mutex before signalling a condition variable, and the condition Now merging. Thanks for the fix, and also for your patience, sorry this review took such a long time. |
Addresses two issues -
Fixes # (issue)
Changes
For significant contributions please make sure you have completed the following items:
CHANGELOG.md
updated for non-trivial changes