Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service logging #105

Merged
merged 4 commits into from
Jun 26, 2024
Merged

Service logging #105

merged 4 commits into from
Jun 26, 2024

Conversation

wscalf
Copy link
Contributor

@wscalf wscalf commented Jun 18, 2024

PR Template:

Describe your changes

  • Moves service logging to the application and data tiers and (where possible) away from biz
  • Uses error wrapping to build something like a stack trace with contextual data
  • Uses the Kratos request logging middleware to handle writing error and trace logs
    • Note: it currently returns detailed error messages to the client, which is not desirable. And considers validation errors to be errors, which may not be desirable.
    • Note: this logs request objects, however, Kratos defines a Redacter interface (that's basically a renamed Stringer) that's only called when logging, and we could implement that for request bodies that contain sensitive material.

Sample log output when a request completes normally:

INFO ts=2024-06-18T21:05:45-04:00 caller=logging/logging.go:40 service.id=wscalf-thinkpadp16vgen1.rmtussc.csb service.name= service.version=d47de56 trace.id= span.id= kind=server component=grpc operation=/kessel.relations.v0.KesselCheckService/Check args=resource:{type:{name:"group"} id:"omni"} relation:"member" subject:{subject:{type:{name:"user"} id:"foo"}} code=0 reason= stack= latency=0.111345474

Sample log output on a runtime error (connection refused by SpiceDB):

ERROR ts=2024-06-18T21:06:12-04:00 caller=logging/logging.go:40 service.id=wscalf-thinkpadp16vgen1.rmtussc.csb service.name= service.version=d47de56 trace.id= span.id= kind=server component=grpc operation=/kessel.relations.v0.KesselCheckService/Check args=resource:{type:{name:"group"} id:"omni"} relation:"member" subject:{subject:{type:{name:"user"} id:"foo"}} code=503 reason= stack=failed to perform check: error invoking CheckPermission in SpiceDB: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:50051: connect: connection refused" latency=0.000420355

Sample log output on a runtime error in a gRPC request that returns a stream of responses (connection refused by SpiceDB):

ERROR ts=2024-06-26T11:22:24-04:00 caller=server/logging.go:34 service.id=wscalf-thinkpadp16vgen1.rmtussc.csb service.name= service.version=845939c trace.id= span.id= kind=server component=server operation=/kessel.relations.v0.KesselLookupService/LookupSubjects args=resource:{type:{name:"group"} id:"omni"} relation:"member" subject_type:{name:"group"} subject_relation:"member" code=503 reason= stack=error retrieving subjects: error invoking LookupSubjects in SpiceDB: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:50051: connect: connection refused" latency=0.0001817

Though note that this does not (yet) obscure the error that gets sent back to the client, which looks like:

ERROR:
  Code: Unavailable
  Message: failed to perform check: error invoking CheckPermission in SpiceDB: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:50051: connect: connection refused"

Ticket reference (if applicable)

Fixes RH33199

Checklist

  • Are the agreed upon acceptance criteria fulfilled?

  • Was the 4-eye-principle applied? (async PR review, pairing, ensembling)

  • Do your changes have passing automated tests and sufficient observability?

  • Are the work steps you introduced repeatable by others, either through automation or documentation?

    • If automation is possible but not done due to other constraints, a ticket to the tech debt sprint is added
    • An SOP (Standard Operating Procedure) was created
  • The Changes were automatically built, tested, and - if needed, behind a feature flag - deployed to our production environment. (Please check this when the new deployment is done and you could verify it.)

  • Are the agreed upon coding/architectural practices applied?

  • Are security needs fullfilled? (e.g. no internal URL)

  • Is the corresponding Ticket in the right state? (should be on "review" now, put to done when this change made it to production)

  • For changes to the public API / code dependencies: Was the whole team (or a sufficient amount of ppl) able to review?

@app-sre-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@wscalf wscalf marked this pull request as ready for review June 26, 2024 15:21
@Rajagopalan-Ranganathan
Copy link
Contributor

/lgtm

@Rajagopalan-Ranganathan Rajagopalan-Ranganathan merged commit 2e19f50 into main Jun 26, 2024
6 checks passed
@alechenninger
Copy link
Member

@wscalf Any thoughts about contributing something to kratos re: streaming logging support?

@wscalf
Copy link
Contributor Author

wscalf commented Jul 1, 2024

Some. They responded to my ticket that it wasn't really implemented yet because there wasn't a good abstraction in place. Working through this, it does look like an abstraction that makes streaming first class and treats unary endpoints as a special case with 1 incoming and 1 outgoing message could be viable, but I haven't proposed it or anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants