-
Notifications
You must be signed in to change notification settings - Fork 327
Update gRPC measures and views from spec #689
Update gRPC measures and views from spec #689
Conversation
c531a42
to
d07bc02
Compare
plugin/ocgrpc/client_metrics.go
Outdated
ClientRequestCount = stats.Int64("grpc.io/client/request_count", "Number of client RPC request messages", stats.UnitNone) | ||
ClientResponseCount = stats.Int64("grpc.io/client/response_count", "Number of client RPC response messages", stats.UnitNone) | ||
ClientRoundTripLatency = stats.Float64("grpc.io/client/roundtrip_latency", "RPC roundtrip latency in msecs", stats.UnitMilliseconds) | ||
ClientSentMessagesPerRPC = stats.Int64("grpc.io/client/sent_messages_per_rpc", "Number of messages sent in the RPC (always 1 for non-streaming RPCs).", stats.UnitNone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unit is wrong, please update it to "1" as the I believe the spec says in https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/gRPC.md#units
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess I hadn't seen the definition of "stats.UnitNone" which is a dimensionless unit, but it was confusing initially, "UnitNone" to me would be "" instead of "1" whereas perhaps we could use "stats.UnitDimensionless"
plugin/ocgrpc/client_metrics.go
Outdated
ClientRoundTripLatency = stats.Float64("grpc.io/client/roundtrip_latency", "RPC roundtrip latency in msecs", stats.UnitMilliseconds) | ||
ClientSentMessagesPerRPC = stats.Int64("grpc.io/client/sent_messages_per_rpc", "Number of messages sent in the RPC (always 1 for non-streaming RPCs).", stats.UnitNone) | ||
ClientSentBytesPerRPC = stats.Int64("grpc.io/client/sent_bytes_per_rpc", "Total bytes sent across all request messages per RPC.", stats.UnitBytes) | ||
ClientReceivedMessagesPerRPC = stats.Int64("grpc.io/client/received_messages_per_rpc", "Number of response messages received per RPC (always 1 for non-streaming RPCs).", stats.UnitNone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about using "1" as the spec requires.
if got, want := m.Description(), defn.desc; got != want { | ||
t.Errorf("%q: Description = %q; want %q", defn.name, got, want) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great spec enforcement test!
t.Fatalf("len(gotMeasures) = %d; want %d", got, want) | ||
} | ||
|
||
for i, v := range views { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, me likey!
case codes.OK: | ||
return "OK" | ||
case codes.Canceled: | ||
return "CANCELLED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, it's funny we have "Canceled" as the enumeration but "CANCELLED" as the string as defined in https://godoc.org/google.golang.org/grpc/codes vs https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enum type stringifies them differently btw in gRPC Go: https://github.com/grpc/grpc-go/blob/master/codes/code_string.go#L23.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Baby Jesus, perhaps let's follow what grpc-go is doing? Thanks for posting that @rakyll.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go codes look like just the names of the vars, highly unlikely to be consistent across languages. The examples in the OpenCensus spec are consistent with the gRPC doc (all upper case, CANCELLED with two "l", UPPER_SNAKE_CASE) which is the most official source I could find. Also explicitly added to the spec to avoid future confusion: census-instrumentation/opencensus-specs#97
stats/record.go
Outdated
func init() { | ||
debug = os.Getenv("OC_STATS_DEBUG") != "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's document this perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it should be a documented feature of the library so I removed it instead.
} | ||
for _, tag := range row.Tags { | ||
if tag.Key == ocgrpc.KeyClientStatus && tag.Value != "OK" { | ||
s.ErrorsTotal += int(count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we removed the measure and view for viewing the total number of gRPC errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so this is why I had to compute it from the CompletedRPCs measures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a naive question: how are we then going to view and track the number of errors?
Thank you for this PR @Ramonza, it's great work, I like the rigor with the spec adherence. It is quite a large PR though, so I'll need to do a couple of takes on reviewing it. |
6a0e34b
to
50c3735
Compare
@@ -37,8 +37,8 @@ func main() { | |||
// the collected data. | |||
view.RegisterExporter(&exporter.PrintExporter{}) | |||
|
|||
// Register the view to collect client request count. | |||
if err := view.Register(ocgrpc.ClientErrorCountView); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a naive question, how are now going to be able to track the number of errors in either the client or server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now break down the requests by status, so you can get the total errors by adding up all the non-success statuses like we do in rpcz.
TestCumulativenessFromHistograms failed on AppVeyor. Looks likely to be timing related. Without rewriting the test to avoid timing issues, I am just increasing the wait times in the hopes that this will resolve the issue.
50c3735
to
b66f142
Compare
|
||
func statusCodeToString(s *status.Status) string { | ||
// see https://github.com/grpc/grpc/blob/master/doc/statuscodes.md | ||
switch c := s.Code(); c { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually switch statements are very slow. Probably better to define a static map<Code, String>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what the intuition behind that is? Did a benchmark and turns out that the switch is actually significantly faster, at least on my laptop:
name time/op
StatusCodeToString_OK-8 2.66ns ± 0%
StatusCodeToString_Unauthenticated-8 2.88ns ± 0%
MapAlternativeImpl_OK-8 7.56ns ± 1%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then keep the switch statement. the map search seems to me very slow, maybe is a Go thing with maps. The way how the switch statements are now implemented is using a 2-level jump table (https://www.codeproject.com/Articles/100473/Something-You-May-Not-Know-About-the-Switch-Statem) so it should be one extra jump, but probably the way how the map is implemented in Go is also 2-level jump (does not store the values directly in the map, instead it uses pointers).
93f04e2
to
a0fc081
Compare
@@ -0,0 +1,52 @@ | |||
package ocgrpc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing license header.
plugin/ocgrpc/benchmark_test.go
Outdated
func BenchmarkMapAlternativeImpl_OK(b *testing.B) { | ||
st := status.New(codes.OK, "OK") | ||
for i := 0; i < b.N; i++ { | ||
s := codeToStringMap[st.Code()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codeToStringMap[st.Code()]
f0341f8
to
b90813a
Compare
OpenCensus gRPC views are being updated to reflect the latest spec. Remove a reference to an old view that will not longer exist after the update, and instead subscribe to the default set of client-side views. OpenCensus PR: census-instrumentation/opencensus-go#689 Change-Id: I3308ec22b71c8548e3c7fd39cfad597cab65d241 Reviewed-on: https://code-review.googlesource.com/26370 Reviewed-by: Jonathan Amsterdam <[email protected]>
Fixes: #687