Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK Starfish v0 requirements #60

Closed
17 tasks done
AbhiPrasad opened this issue Apr 24, 2023 · 6 comments
Closed
17 tasks done

SDK Starfish v0 requirements #60

AbhiPrasad opened this issue Apr 24, 2023 · 6 comments
Assignees
Labels
META Tracking progress over multiple repositories

Comments

@AbhiPrasad
Copy link
Member

AbhiPrasad commented Apr 24, 2023

Add cache hit/miss rate

Any call to cache that doesn't return data (null) is treated as a miss. We define a span that has a cache hit as having the span data field cache.hit as true. Might require SDK to patch cache abstractions of framework (django).

To be done in Python SDK for any frameworks that support this - django is required.

Cache hit/miss rate

  1. AbhiPrasad
  2. Enhancement Integration: Django Status: Backlog
    antonpirker

Add db platform data to to span data

Add this information to span data field as db.system, matching OpenTelemetry's well known conventions.

For example, db.system of postgresql would indicate that it's a postgres database.

Add cache item size

Add a field to cache spans that defines how big the item that is being get/set: cache.item_size. This is an integer and should be in bytes. Blocked by cache hit/miss rate work.

Cache Item size

Add information used for action field

The action field on the extracted span metrics is either the operation for DB spans (SELECT/DELETE etc.), and http method for HTTP spans (GET/PUT etc.).

For operation, we'll be using and setting db.operation on span data. For http method, we'll be using http.method on span data.

If db.operation OR http.method is not set on span data, Relay will have to parse it out from the span description.

@wmak
Copy link
Member

wmak commented Apr 25, 2023

From the Slack thread we'll also need:

  • Action
    • This would be the operation for DB spans (SELECT/DELETE etc.), and http method for HTTP spans (GET/PUT etc.)
    • There was discussion where we're not sure if this should be done in the SDK or in ingestion
  • Span Status
    • The main motivation of this is so we can get client timeouts for http spans

@AbhiPrasad
Copy link
Member Author

AbhiPrasad commented Apr 26, 2023

We can add all of the db call-level attributes if possible: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/database/#call-level-attributes: db.operation.

db.operation in this case would be the field that would eventually be extracted to be action. Note that we can only provide db.operation if the underlying framework gives us this information, otherwise it will have to be extracted from the query - which has to be done server-side.

I'm still confused about span status - we attach http status codes under span.status already on the SDK. Can we not check for a 504 or a 408 and label timeout appropriately? Why does this require SDK changes?

@mitsuhiko
Copy link
Member

What is a client timeout here? Presumably if something times out, you never get a response and thus no status code. What do SDKs report in that case? I wouldn't be surprised if a timeout usually translates into some sort of socket error which requires custom handling and might either result in a sentry error (and failed transaction), or is manually handled and the output is undefined.

@AbhiPrasad
Copy link
Member Author

Based on a slack thread - here's a snippet from @gggritso about behaviour that he would like to see for timeouts.

import urllib
import urllib.request

try:
    response = urllib.request.urlopen('http://python.org/', timeout=0.0001)
    html = response.read()
except urllib.error.URLError as e:
    print("URL Error", e.reason)
    if (e.reason == 'timed out'):
        span['status'] = 'time out'
        raise e

@AbhiPrasad
Copy link
Member Author

Given we've released the Python SDK with cache hit information, I think we can say we have enough data for v0 of starfish.

action (db operation, http method) is going to be parsed by Relay for the most part (but both Node/Python should be sending them regardless)

span status is still an open concern, but I think we can defer that will v1. cc @alexjillard

@smeubank
Copy link
Member

the SDK should be sending http.method where possible under span data. In both node/python it was determined that none of the underlying instrumentation could parse and determine db.operation so this was something relay needed to do.

So Abhi doesn't have to sign into GH while OOO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
META Tracking progress over multiple repositories
Projects
None yet
Development

No branches or pull requests

4 participants