Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(otel): reconnect async traces (e.g. LROs) #13147

Merged
merged 6 commits into from
Nov 21, 2023

Conversation

dbolduc
Copy link
Member

@dbolduc dbolduc commented Nov 16, 2023

Fixes #13141

See the linked issue for the problem write up. The changes in this PR generated the seemingly correct, second screenshot.

The ABI update was necessary because we stop defining some templated future<StatusOr<std::chrono::...>> type, which triggered a false positive in check-api.

Something is buggy with (at least) opentelemetry-cpp + gcc 7.3.1. So patch the Dockerfile used in that build to work around it.


This change is Reviewable

Copy link

codecov bot commented Nov 16, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ecb9dcc) 93.00% compared to head (1a577cb) 92.99%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13147      +/-   ##
==========================================
- Coverage   93.00%   92.99%   -0.01%     
==========================================
  Files        2137     2137              
  Lines      185877   185898      +21     
==========================================
+ Hits       172868   172884      +16     
- Misses      13009    13014       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dbolduc
Copy link
Member Author

dbolduc commented Nov 17, 2023

Ugh, the gcc-7-3 error is out of our control. I need to think about how we can avoid it in our repo.

The following code reproduces the same error. It does not use any google-cloud-cpp helpers. Some bug in the compiler causes this basic opentelemetry-cpp test to fail.

TEST(OpenTelemetry, DarrenOTelOnly) {
  auto span_catcher = InstallSpanCatcher();

  auto provider = opentelemetry::trace::Provider::GetTracerProvider();
  auto tracer = provider->GetTracer("gcc-7-3 test");

  auto parent = tracer->StartSpan("parent");
  opentelemetry::trace::Scope parent_scope(parent);

  auto child = tracer->StartSpan("child");
  child->End();

  parent->End();

  EXPECT_THAT(
      span_catcher->GetSpans(),
      ElementsAre(AllOf(SpanNamed("child"),
                        SpanWithParentSpanId(parent->GetContext().span_id())),
                  SpanNamed("parent")));
}
[ RUN      ] OpenTelemetry.DarrenOTelOnly
../google/cloud/internal/grpc_opentelemetry_test.cc:279: Failure
Value of: span_catcher->GetSpans()
Expected: has 2 elements where
element #0 (has name: child) and (has parent span id: 5d48e34bfafc4edb),
element #1 has name: parent
  Actual: { (ptr = 0x1f422a0, value = Span {name=child, kind=INTERNAL, instrumentation_scope {gcc-7-3 test, },
			parent_span_id=0000000000000000
			attributes=[],
			events=[],
			links=[]}), (ptr = 0x1f42c20, value = Span {name=parent, kind=INTERNAL, instrumentation_scope {gcc-7-3 test, },
			parent_span_id=0000000000000000
			attributes=[],
			events=[],
			links=[]}) }, whose element #0 doesn't match, has parent span id: 0000000000000000

[  FAILED  ] OpenTelemetry.DarrenOTelOnly (0 ms)

@dbolduc dbolduc force-pushed the otel-attach-context-async-backoff branch from 3db9441 to 8f99a85 Compare November 17, 2023 07:07
@dbolduc dbolduc marked this pull request as ready for review November 17, 2023 08:01
@dbolduc dbolduc requested a review from a team as a code owner November 17, 2023 08:01
@@ -208,6 +210,52 @@ TEST(OpenTelemetry, TracedAsyncBackoffDisabled) {
EXPECT_THAT(spans, IsEmpty());
}

TEST(OpenTelemetry, TracedAsyncBackoffPreservesContext) {
if (CompilerId() == "GNU" && CompilerVersion() == "7.3.1") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does OTel claim to support GCC 7.x? Is there a bug in OTel we can reference? Can we detect this with an OTel version?

Copy link
Member Author

@dbolduc dbolduc Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does OTel claim to support GCC 7.x?

My reading is they support compilers that implement the standard correctly.

Is there a bug in OTel we can reference?

good call. yes: open-telemetry/opentelemetry-cpp#1014

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a patch linked in the issue. I will try to apply it to our centos-7 dockerfile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to some comments on that bug, GCC 8.x also trouble. You may need to apply that patch to several other builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading is they support compilers that implement the standard correctly.

There are no compilers without defects. That implies they support no compilers at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patched.

According to some comments on that bug, GCC 8.x also trouble. You may need to apply that patch to several other builds.

Thanks for pointing that out. I forgot that the demo-install builds do not run these unit tests. I assumed they all worked.

Opened #13159 because I am not sure if we should apply the patch by default, or just call out the issue in a comment.

There are no compilers without defects. That implies they support no compilers at all.

I realize that and my comment was tongue-in-cheek. Here is their actual text: https://github.com/open-telemetry/opentelemetry-cpp/blob/cb603ad97f33e52340e627e1cb43ba73bb1d7ef0/README.md?plain=1#L48-L49

auto span = MakeSpan("Async Backoff");
OTelScope scope(span);
auto timer = cq.MakeRelativeTimer(duration);
return EndSpan(span, std::move(timer));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move(span)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dbolduc
Copy link
Member Author

dbolduc commented Nov 21, 2023

Ping.

(I am not blocked on this PR. It would just be nice to have the fix in.)

@dbolduc dbolduc merged commit f41f8a6 into googleapis:main Nov 21, 2023
59 checks passed
@dbolduc dbolduc deleted the otel-attach-context-async-backoff branch November 21, 2023 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reconnect async traces with backoffs
3 participants