-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: refactor types, error handling, clean up endpoints #37
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just left few questions
server.Router.HandleFunc("/protocol-stats", server.createProtocolStats).Methods("POST") | ||
server.Router.HandleFunc("/received-messages", server.createReceivedMessages).Methods("POST") | ||
server.Router.HandleFunc("/waku-metrics", server.createWakuTelemetry).Methods("POST") | ||
server.Router.HandleFunc("/received-envelope", server.createReceivedEnvelope).Methods("POST") | ||
server.Router.HandleFunc("/sent-envelope", server.createSentEnvelope).Methods("POST") | ||
server.Router.HandleFunc("/update-envelope", server.updateEnvelope).Methods("POST") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up these endpoints mean that this is a breaking change, since clients out there will no longer be able to push metrics. Do we want to allow backwards compatibility for telemetry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good question. I was not sure if they are still actively used. Happy to bring them back if you think it would break things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure tbh.
I think it would depend whether these changes will be available in status release 2.30 or not, and if not, if we care about the telemetry data from 2.30 users
type PeerCount struct { | ||
ID int `json:"id"` | ||
CreatedAt int64 `json:"createdAt"` | ||
Timestamp int64 `json:"timestamp"` | ||
NodeName string `json:"nodeName"` | ||
NodeKeyUid string `json:"nodeKeyUid"` | ||
PeerID string `json:"peerId"` | ||
PeerCount int `json:"peerCount"` | ||
StatusVersion string `json:"statusVersion"` | ||
data types.PeerCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a data
attribute, wdyt about using
type PeerCount struct {
types.PeerCount
And the same for PeerConnFailure
:
type PeerConnFailure struct {
types.PeerCount
I might be wrong but I think this would let you access the attributes directly, i.e:
myPeerConnFailure.Timestamp
, instead of myPeerConnFailure.data.Timestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, not the case, I'd have to implement the getters for the values - calling methods on composed types works, but to access the attributes, you need to do something like
type PeerCount struct {
types.PeerCount
}
func (r *PeerCount) process(db *sql.DB, errs *MetricErrors, data *types.TelemetryRequest) error {
...
lastInsertId := 0
err = stmt.QueryRow(
r.PeerCount.Timestamp,
...
So it is not really solving much:( The only benefit is that I can use json.Unmarshal(*data.TelemetryData, &r);
instead of &r.data
, but that seems to be the only benefit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:(
if err := json.Unmarshal(*data.TelemetryData, &stats); err != nil { | ||
errorDetails = append(errorDetails, map[string]interface{}{"Id": data.Id, "Error": fmt.Sprintf("Error decoding protocol stats: %v", err)}) | ||
err := stats.process(s.DB, &errorDetails, &data) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log the err? Same question for lines 81 87 93 99 105 and 111, or is the log in line 120 enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could also add logging the error to the errors.append
method to make sure every new error is logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried, this would result in telemetry logs like the following, and we could remove the log on line 120 to not duplicate
2024-08-08T10:28:43.752+0200 ERROR telemetry/errors.go:28 Error saving peer count: pq: duplicate key value violates unique constraint "peercount_unique"
github.com/status-im/dev-telemetry/telemetry.(*MetricErrors).Append
/home/vpavlin/devel/github.com/vpavlin/telemetry/telemetry/errors.go:28
github.com/status-im/dev-telemetry/telemetry.(*PeerCount).process
/home/vpavlin/devel/github.com/vpavlin/telemetry/telemetry/peer_count.go:41
github.com/status-im/dev-telemetry/telemetry.(*Server).createTelemetryData
/home/vpavlin/devel/github.com/vpavlin/telemetry/telemetry/server.go:99
net/http.HandlerFunc.ServeHTTP
/usr/lib/go/src/net/http/server.go:2136
github.com/rs/cors.(*Cors).Handler-fm.(*Cors).Handler.func1
/home/vpavlin/devel/pkg/mod/github.com/rs/[email protected]/cors.go:289
net/http.HandlerFunc.ServeHTTP
/usr/lib/go/src/net/http/server.go:2136
github.com/status-im/dev-telemetry/telemetry.(*Server).rateLimit-fm.(*Server).rateLimit.func1
/home/vpavlin/devel/github.com/vpavlin/telemetry/telemetry/server.go:222
net/http.HandlerFunc.ServeHTTP
/usr/lib/go/src/net/http/server.go:2136
github.com/gorilla/mux.(*Router).ServeHTTP
/home/vpavlin/devel/pkg/mod/github.com/gorilla/[email protected]/mux.go:210
net/http.serverHandler.ServeHTTP
/usr/lib/go/src/net/http/server.go:2938
net/http.(*conn).serve
/usr/lib/go/src/net/http/server.go:2009
if err != nil { | ||
errs.Append(data.Id, fmt.Sprintf("Error saving peer count: %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should return the details of the error to the user, or if instead we should log the err
, and just return error saving peer count
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this depends on where is it better for us to see the errors: clients running status-go, the telemetry server, or both? I think we're most likely to look at the status-go logs from running status-desktop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is a tough question - returning the error leaks implementation details to clients - a) they usually don't need to know as they cannot do anything about it, b) it might be dangerous from the security perspective
That said, we'd need to make sure we have good access to Telemetry server logs otherwise to be able to spot the errors and reasons there.
Also, I would prefer not change it in this PR - how about a new issue "Review error handling and reporting in telemetry server and client"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, we'd need to make sure we have good access to Telemetry server logs otherwise to be able to spot the errors and reasons there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not change it in this PR - how about a new issue "Review error handling and reporting in telemetry server and client"?
Sure. Can be done in a separate PR. Will open an issue so we dont forget
peerIDHash := sha256.Sum256([]byte(protocolStats.data.PeerID)) | ||
protocolStats.data.PeerID = hex.EncodeToString(peerIDHash[:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remembering past discussions, maybe we should just store the peerID without hashing it? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not want to change it here to avoid bringing undrelated changes, but can do if it is pereferred:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be done either on a diff PR or in a diff commit! no preferences here either 🚀
d8b1e11
to
7444689
Compare
7444689
to
9d80407
Compare
@@ -99,7 +99,8 @@ func (rl *RateLimiter) cleanup(ctx context.Context, cleanupEvery time.Duration) | |||
return | |||
case now := <-t.C: | |||
numCleaned := 0 | |||
for ip, limiter := range rl.values() { | |||
limiters := rl.values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richard-ramos unrelated to the refactoring, but the CI failed, so I added the fix (I hope this should fix it?)
https://github.com/status-im/telemetry/actions/runs/10353387310/job/28656165988?pr=37
receivedenvelope
and the testjson.Unmarshal
toprocess
methods to de-bloatserver.go
TODO:
ErrorDetail
is used everywhere