Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor infinite tracing for improved clarity and simplicity, add su… #180

Merged
merged 4 commits into from
Jan 20, 2021

Conversation

jack-berg
Copy link
Contributor

@jack-berg jack-berg commented Jan 5, 2021

Refactor infinite tracing for improved clarity and simplified. Refactor implies that the code has been reorganized but is functionally equivalent. This is mostly true for this PR. The changes in behavior include:

  • Backoff sequence added when FAILED_PRECONDITIONS status is returned.
  • When a timeout error occurs due to 15 seconds of inactivity, reconnect immediately instead of backing off. This should have been the behavior but I think it slipped through the cracks when the change was made to detect timeout errors.
  • Log messages have been adjusted slightly and standardized.
  • Added support for _flakyCode config, which allows for debugging error conditions with _flakyPercentage.
  • Removed busy waiting while backoff is occurring in favor of a proper synchronizer mechanism.

Closes #164.
Closes #160.

}

@VisibleForTesting
static V1.Span convert(SpanEvent spanEvent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the span conversion process to remain in a separate class. I know we discussed it is a straight forward and simple conversion. I think it is separate enough of a responsibility to organize on its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can do that, but before I do, let me just explain my reasoning for why I didn't do it initially.

IMO, decomposition is about breaking large problems into smaller problems of singular concern, and a separate java class files isn't always the best way to accomplish that. They can reduce readability (i.e. having to look at 5 files to figure out what's happening is harder to undertstand than 5 functions in a single file). And they can encourage consumption outside of code outside its intended use (i.e. a private function in a class file sends a clear signal to maintainers that only the owning class should consume it while a separate class might more readily reused). They can help reduce clutter, but at 138 lines I don't think that this class is suffering from that.

Anyway, that's my point of view in general, but in this case I don't think breaking it out is such a bad thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the insight and this will help me in the future! At times, I've gone through what seems a bit much of code indirection and agree with your point it is hard to read.

Copy link
Contributor

@XiXiaPdx XiXiaPdx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be a bit too close to these changes as @jack-berg and have discussed them at length. I think these changes do improve the code quality, readability, and reduces complexity.

If you plan to manual test for proper behavior, here are some notes that might be helpful.

  1. Use the flaky_code agent config setting for testing. This header metadata is only observed by the staging Trace Observer, so it will not work with a production Trace Observer.

  2. Testing FAILED_PRECONDITION will be just verifying we disconnect from the Trace Observer and reconnect to the same one. If you want to verify we do reconnect with a different IP address, the ∞T team has offered to help with that test because it is easy for them to do.

Copy link
Contributor

@jasonjkeller jasonjkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have much exposure to this part of the codebase before the refactoring so I can't say that I have much of a feel for why it was refactored. That said, the refactored code looks good to me. Overall it looks to have reduced the number of classes a good amount without creating "god" classes, which is nice.

Has this been manually tested and verified to work (old and new functionality alike)? And have the AITs been run against this branch? If so, then let's go ahead and merge this.

status = statusRuntimeException.getStatus();
metricAggregator.incrementCounter("Supportability/InfiniteTracing/Span/gRPC/" + status.getCode());
if (status.getCode() == Status.Code.UNIMPLEMENTED) {
// See: https://source.datanerd.us/agents/agent-specs/blob/master/Infinite-Tracing.md#unimplemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugg I wish these specs were public

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been manually tested and verified to work (old and new functionality alike)? And have the AITs been run against this branch?

Yes - lots of manual testing, and the AITs have passed.

@@ -19,6 +19,7 @@
public static final String TRACE_OBSERVER = "trace_observer";
public static final String SPAN_EVENTS = "span_events";
public static final String FLAKY_PERCENTAGE = "_flakyPercentage";
public static final String FLAKY_CODE = "_flakyCode";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the _flakyCode config need to be publicly documented for customer use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! No, the headers included by _flakyCode and _flakyPercentage only impact interactions with the staging environment.

@jack-berg jack-berg merged commit 34a24dd into main Jan 20, 2021
@jack-berg jack-berg deleted the refactor-infinite-tracing branch January 20, 2021 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants