Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: use new OpenCensus stats/tagging API. #3647

Merged
merged 9 commits into from
Nov 7, 2017

Conversation

sebright
Copy link
Contributor

This commit updates gRPC core to use io.opencensus:opencensus-api and
io.opencensus:opencensus-contrib-grpc-metrics instead of
com.google.instrumentation:instrumentation-api for stats and tagging. The gRPC
Monitoring Service continues to use instrumentation-api.

The main changes affecting gRPC:

  • The StatsContextFactory is replaced by three objects, StatsRecorder, Tagger,
    and TagContextBinarySerializer.

  • The StatsRecorder, Tagger, and TagContextBinarySerializer are never null,
    but the objects are no-ops when the OpenCensus implementation is not
    available.

This commit includes changes written by @songy23 and @sebright.


I added two TODOs for parts that I didn't know how to update. This PR is still using the snapshot of OpenCensus.

/cc @zhangkun83

@grpc-kokoro
Copy link

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok for us to run on our infrastructure.

@thelinuxfoundation
Copy link

Thank you for your pull request. Before we can look at your contribution, we need to ensure all contributors are covered by a Contributor License Agreement.

After the following items are addressed, please respond with a new comment here, and the automated system will re-verify.

Regards,
The Linux Foundation CLA GitHub bot

@zhangkun83 zhangkun83 self-assigned this Oct 31, 2017
@sebright
Copy link
Contributor Author

testing Linux Foundation CLA

1 similar comment
@sebright
Copy link
Contributor Author

testing Linux Foundation CLA

Copy link
Contributor

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small-lish comments. The Census API looks good.

Technically this is a breaking behavioral change, because gRPC now handles a different type of Context key. However, it's unlikely that any 3rd-party has done any work on the old Census API. So this should fine.

StatsRecorder statsRecorder =
this.statsRecorder != null ? this.statsRecorder : Stats.getStatsRecorder();
// // TODO: How do we check whether stats is enabled, now that the StatsRecorder is always
// // non-null? Uncommenting this line causes test failures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most tests pass in the fake implementations from StatsTestUtils, but Stats.getState() returns DISABLED. If you don't initialize CensusStatsModule, these tests will fail.

A possibly better approach, is to make CensusStatsModule public, and change the three-argument Census impl setter of the builder to overrideCensusStatsModule(CensusStatsModule). Tests will create CensusStatsModule out of the impls from StatsTestUtils, and here you can do:

if (statsEnabled) {
  CensusStatsModule censusStats = this.censusStatsOverride;
  if (censusStats == null && Stats.getStats() == StatsCollectionState.ENABLED) {
    CensusStatsModule censusStats =
        new CensusStatsModule(...);
  }
  if (censusStats != null) {
    tracerFactories.add(censusStats.getServerTracerFactory());
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangkun83 I refactored the code for overriding the OpenCensus implementation. There are two parts that I'm not sure about.

  • CensusStatsModule no longer caches the StatsClientInterceptor and ServerTracerFactory. It creates a new object whenever getServerTracerFactory or getClientInterceptor is called. Is that correct?
  • I'm not completely sure I used the right values for recordStats, especially in CensusModulesTest.

new CensusStatsModule(statsFactory, GrpcUtil.STOPWATCH_SUPPLIER, true, recordStats);
tracerFactories.add(censusStats.getServerTracerFactory());
}
Tagger tagger = this.tagger != null ? this.tagger : Tags.getTagger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, are the three components coupled? For example, does one Tagger implementation only works with a certain StatsRecorder, or can they mix and match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tagger and TagContextBinarySerializer objects are coupled. The StatsRecorder is independent, except that it also uses the gRPC Context key that is used by the tagging classes.

@songy23
Copy link
Contributor

songy23 commented Nov 1, 2017

/cc @dinooliva @bogdandrutu

Copy link
Contributor

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase.

build.gradle Outdated
opencensus_impl: 'io.opencensus:opencensus-impl:0.7.0',
opencensus_api: 'io.opencensus:opencensus-api:0.8.0',
opencensus_contrib_grpc_metrics: 'io.opencensus:opencensus-contrib-grpc-metrics:0.8.0',
opencensus_impl: 'io.opencensus:opencensus-impl:0.8.0',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you depend on the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it is only needed for the interop tests:

runtime libraries.opencensus_impl

@sebright
Copy link
Contributor Author

sebright commented Nov 2, 2017

I rebased. I didn't squash any commits, so there is still a commit to upgrade to OpenCensus 0.8.0, even though that change was made on master.

@ejona86
Copy link
Member

ejona86 commented Nov 2, 2017

"io.opencensus.contrib.grpc.metrics.RpcMeasureConstants" ☹️ Normally "contrib" implies unsupported. It's sort of like: "Here's a bag of things others found useful, but we don't maintain them and their quality may vary."

@ejona86
Copy link
Member

ejona86 commented Nov 2, 2017

Okay to test.

@ejona86 ejona86 added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Nov 2, 2017
@kokoro-team kokoro-team removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Nov 2, 2017
@ejona86 ejona86 added this to the 1.8 milestone Nov 2, 2017
@ejona86 ejona86 added the TODO:release blocker Issue/PR is important enough to delay the release. Removed after release issues resolved label Nov 2, 2017
@bogdandrutu
Copy link
Contributor

@ejona86 sorry for the wrong "contrib" name. We used it for "other" artifacts that are not in the core library. We will work in the future to maybe change that name. Thanks for the feedback, we are committed to keep backwards compatibility for "contrib" as well.

@ejona86
Copy link
Member

ejona86 commented Nov 2, 2017

Jenkins, retest this please

Copy link
Contributor

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Some minor comments.

@@ -61,15 +67,56 @@
}
};

private static final StatsContextFactory DUMMY_STATS_FACTORY =
new StatsContextFactory() {
private static final Tagger DUMMY_TAGGER =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably won't hurt if this test uses the fake implementations from StatsTestUtils. Then you can save a lot of lines of the dummy implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private static final StatsContextFactory DUMMY_STATS_FACTORY =
new StatsContextFactory() {

private static final Tagger DUMMY_TAGGER =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public static void setStatsContextFactory(
AbstractManagedChannelImplBuilder<?> builder, StatsContextFactory factory) {
builder.statsContextFactory(factory);
public static void setStatsImplementation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods should take the form of the corresponding methods in the builders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -93,39 +107,29 @@ public long getMetricAsLongOrFail(MeasurementDescriptor metricName) {
}

/**
* This tag will be propagated by {@link FakeStatsContextFactory} on the wire.
* This tag will be propagated by {@link FakeTagger} on the wire.
*/
public static final TagKey EXTRA_TAG = TagKey.create("/rpc/test/extratag");

private static final String EXTRA_TAG_HEADER_VALUE_PREFIX = "extratag:";
private static final String NO_EXTRA_TAG_HEADER_VALUE_PREFIX = "noextratag";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this and its related logic can be removed.
It was needed because the old StatsContext didn't provide a way to tell if it's empty, thus the metadata was always sent. Now that you have implemented the optimization to skip sending metdata if TagContext is empty, we should no longer need the metadata form that represents "no tag".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. I didn't try to change any of the logic for serializing and deserializing tags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant if you remove this field and the if-branches that references this field, tests should not fail. Because of the following if you added to CensusStatsModule, those branches will never be executed now.

        if (!module.tagger.empty().equals(parentCtx)) {
          headers.put(module.statsHeader, parentCtx);
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the optimization was added before this PR? This PR just changes the comparison to use tagger.empty() instead of statsCtxFactory.getDefault().

I removed NO_EXTRA_TAG_HEADER_VALUE_PREFIX and the branches that used it, and the tests passed. It is the last commit. Shouldn't serialize still handle the case where the TagContext isn't empty, but it doesn't contain EXTRA_TAG?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right ... the optimization was added before.

To keep it simple, FakeTagContextBinarySerializer is supposed to only handle EXTRA_TAG, and tests are supposed to only add EXTRA_TAG. Now fromByteArray() will fail if the metadata contains anything other than EXTRA_TAG. toByteArray() will fail too, by throwing NullPointerException -- it's probably better to make it throw UnsupportedOperationException if extraTagValue==null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made toByteArray throw an UnsupportedOperationException.

@zhangkun83
Copy link
Contributor

@zhangkun83 I refactored the code for overriding the OpenCensus implementation. There are two parts that I'm not sure about.

  • CensusStatsModule no longer caches the StatsClientInterceptor and ServerTracerFactory. It creates a new object whenever getServerTracerFactory or getClientInterceptor is called. Is that correct?

Yes it's correct. The caching wasn't necessary because it's only called per channel/server.

  • I'm not completely sure I used the right values for recordStats, especially in CensusModulesTest.

That sounds right. recordStats should be true in most of the tests.

This commit updates gRPC core to use io.opencensus:opencensus-api and
io.opencensus:opencensus-contrib-grpc-metrics instead of
com.google.instrumentation:instrumentation-api for stats and tagging. The gRPC
Monitoring Service continues to use instrumentation-api.

The main changes affecting gRPC:

- The StatsContextFactory is replaced by three objects, StatsRecorder, Tagger,
  and TagContextBinarySerializer.

- The StatsRecorder, Tagger, and TagContextBinarySerializer are never null,
  but the objects are no-ops when the OpenCensus implementation is not
  available.

This commit includes changes written by @songy23 and @sebright.
gRPC shouldn't use the state to determine whether to initialize the
CensusStatsModule, because the state can be changed at runtime.
This commit adds a method to set the CensusStatsModule on
AbstractServerImplBuilder and AbstractManagedChannelImplBuilder.  It is simpler
to set the whole CensusStatsModule than pass in all necessary OpenCensus
implementation objects.
@zhangkun83
Copy link
Contributor

LGTM. Please merge master onto your branch and resolve the conflict.

@sebright
Copy link
Contributor Author

sebright commented Nov 7, 2017

Thanks. I merged my branch.

@ejona86
Copy link
Member

ejona86 commented Nov 7, 2017

okay to test

@ejona86 ejona86 added kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary labels Nov 7, 2017
@ejona86 ejona86 removed kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run labels Nov 7, 2017
@zhangkun83 zhangkun83 merged commit ef2ec94 into grpc:master Nov 7, 2017
@sebright sebright deleted the update-opencensus branch November 7, 2017 20:25
@sebright
Copy link
Contributor Author

sebright commented Nov 7, 2017

Thanks!

@ejona86 ejona86 added the TODO:backport PR needs to be backported. Removed after backport complete label Nov 7, 2017
@dinooliva
Copy link

Excellent!

zhangkun83 pushed a commit to zhangkun83/grpc-java that referenced this pull request Nov 9, 2017
This commit updates gRPC core to use io.opencensus:opencensus-api and
io.opencensus:opencensus-contrib-grpc-metrics instead of
com.google.instrumentation:instrumentation-api for stats and tagging. The gRPC
Monitoring Service continues to use instrumentation-api.

The main changes affecting gRPC:

- The StatsContextFactory is replaced by three objects, StatsRecorder, Tagger,
  and TagContextBinarySerializer.

- The StatsRecorder, Tagger, and TagContextBinarySerializer are never null,
  but the objects are no-ops when the OpenCensus implementation is not
  available.

This commit includes changes written by @songy23 and @sebright.
@zhangkun83 zhangkun83 removed TODO:release blocker Issue/PR is important enough to delay the release. Removed after release issues resolved TODO:backport PR needs to be backported. Removed after backport complete labels Nov 10, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants