Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

Update gRPC measures and views from spec #689

Merged

Conversation

semistrict
Copy link
Contributor

@semistrict semistrict commented Apr 8, 2018

Fixes: #687

@semistrict semistrict force-pushed the update-grpc-stats branch 4 times, most recently from c531a42 to d07bc02 Compare April 9, 2018 08:40
@semistrict semistrict requested review from rakyll and odeke-em and removed request for rakyll April 9, 2018 08:51
ClientRequestCount = stats.Int64("grpc.io/client/request_count", "Number of client RPC request messages", stats.UnitNone)
ClientResponseCount = stats.Int64("grpc.io/client/response_count", "Number of client RPC response messages", stats.UnitNone)
ClientRoundTripLatency = stats.Float64("grpc.io/client/roundtrip_latency", "RPC roundtrip latency in msecs", stats.UnitMilliseconds)
ClientSentMessagesPerRPC = stats.Int64("grpc.io/client/sent_messages_per_rpc", "Number of messages sent in the RPC (always 1 for non-streaming RPCs).", stats.UnitNone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unit is wrong, please update it to "1" as the I believe the spec says in https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/gRPC.md#units
screen shot 2018-04-09 at 10 53 50 am

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess I hadn't seen the definition of "stats.UnitNone" which is a dimensionless unit, but it was confusing initially, "UnitNone" to me would be "" instead of "1" whereas perhaps we could use "stats.UnitDimensionless"

ClientRoundTripLatency = stats.Float64("grpc.io/client/roundtrip_latency", "RPC roundtrip latency in msecs", stats.UnitMilliseconds)
ClientSentMessagesPerRPC = stats.Int64("grpc.io/client/sent_messages_per_rpc", "Number of messages sent in the RPC (always 1 for non-streaming RPCs).", stats.UnitNone)
ClientSentBytesPerRPC = stats.Int64("grpc.io/client/sent_bytes_per_rpc", "Total bytes sent across all request messages per RPC.", stats.UnitBytes)
ClientReceivedMessagesPerRPC = stats.Int64("grpc.io/client/received_messages_per_rpc", "Number of response messages received per RPC (always 1 for non-streaming RPCs).", stats.UnitNone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about using "1" as the spec requires.

if got, want := m.Description(), defn.desc; got != want {
t.Errorf("%q: Description = %q; want %q", defn.name, got, want)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great spec enforcement test!

t.Fatalf("len(gotMeasures) = %d; want %d", got, want)
}

for i, v := range views {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, me likey!

case codes.OK:
return "OK"
case codes.Canceled:
return "CANCELLED"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, it's funny we have "Canceled" as the enumeration but "CANCELLED" as the string as defined in https://godoc.org/google.golang.org/grpc/codes vs https://github.com/grpc/grpc/blob/master/doc/statuscodes.md

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum type stringifies them differently btw in gRPC Go: https://github.com/grpc/grpc-go/blob/master/codes/code_string.go#L23.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Baby Jesus, perhaps let's follow what grpc-go is doing? Thanks for posting that @rakyll.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go codes look like just the names of the vars, highly unlikely to be consistent across languages. The examples in the OpenCensus spec are consistent with the gRPC doc (all upper case, CANCELLED with two "l", UPPER_SNAKE_CASE) which is the most official source I could find. Also explicitly added to the spec to avoid future confusion: census-instrumentation/opencensus-specs#97

stats/record.go Outdated
func init() {
debug = os.Getenv("OC_STATS_DEBUG") != ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document this perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it should be a documented feature of the library so I removed it instead.

}
for _, tag := range row.Tags {
if tag.Key == ocgrpc.KeyClientStatus && tag.Value != "OK" {
s.ErrorsTotal += int(count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we removed the measure and view for viewing the total number of gRPC errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so this is why I had to compute it from the CompletedRPCs measures

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a naive question: how are we then going to view and track the number of errors?

@odeke-em
Copy link
Member

odeke-em commented Apr 9, 2018

Thank you for this PR @Ramonza, it's great work, I like the rigor with the spec adherence. It is quite a large PR though, so I'll need to do a couple of takes on reviewing it.

@@ -37,8 +37,8 @@ func main() {
// the collected data.
view.RegisterExporter(&exporter.PrintExporter{})

// Register the view to collect client request count.
if err := view.Register(ocgrpc.ClientErrorCountView); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a naive question, how are now going to be able to track the number of errors in either the client or server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now break down the requests by status, so you can get the total errors by adding up all the non-success statuses like we do in rpcz.

Ramon Nogueira added 3 commits April 10, 2018 10:19
TestCumulativenessFromHistograms failed on AppVeyor. Looks likely
to be timing related. Without rewriting the test to avoid timing
issues, I am just increasing the wait times in the hopes that this
will resolve the issue.
@semistrict
Copy link
Contributor Author

@odeke-em @rakyll PTAL


func statusCodeToString(s *status.Status) string {
// see https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
switch c := s.Code(); c {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually switch statements are very slow. Probably better to define a static map<Code, String>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the intuition behind that is? Did a benchmark and turns out that the switch is actually significantly faster, at least on my laptop:

name                                  time/op
StatusCodeToString_OK-8               2.66ns ± 0%
StatusCodeToString_Unauthenticated-8  2.88ns ± 0%
MapAlternativeImpl_OK-8               7.56ns ± 1%

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then keep the switch statement. the map search seems to me very slow, maybe is a Go thing with maps. The way how the switch statements are now implemented is using a 2-level jump table (https://www.codeproject.com/Articles/100473/Something-You-May-Not-Know-About-the-Switch-Statem) so it should be one extra jump, but probably the way how the map is implemented in Go is also 2-level jump (does not store the values directly in the map, instead it uses pointers).

@@ -0,0 +1,52 @@
package ocgrpc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license header.

func BenchmarkMapAlternativeImpl_OK(b *testing.B) {
st := status.New(codes.OK, "OK")
for i := 0; i < b.N; i++ {
s := codeToStringMap[st.Code()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codeToStringMap[st.Code()]

@semistrict semistrict merged commit 85464ad into census-instrumentation:master Apr 20, 2018
gopherbot pushed a commit to googleapis/google-cloud-go that referenced this pull request Apr 24, 2018
OpenCensus gRPC views are being updated to reflect the latest spec.
Remove a reference to an old view that will not longer exist after
the update, and instead subscribe to the default set of client-side
views.

OpenCensus PR: census-instrumentation/opencensus-go#689

Change-Id: I3308ec22b71c8548e3c7fd39cfad597cab65d241
Reviewed-on: https://code-review.googlesource.com/26370
Reviewed-by: Jonathan Amsterdam <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants