Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: refactor types, error handling, clean up endpoints #37

Merged
merged 4 commits into from
Aug 13, 2024

Conversation

vpavlin
Copy link
Member

@vpavlin vpavlin commented Aug 7, 2024

  • extracted the telemetry types into a separate package
  • added a simple struct to unify the errors
  • fixed file naming for receivedenvelope and the test
  • added new "wrapping" types in the telemetry package and moved json.Unmarshal to process methods to de-bloat server.go
  • removed endpoints I did not find being used anywhere

TODO:

  • make sure the ErrorDetail is used everywhere
  • confirm the removed endpoints were actually unused:)

Copy link
Member

@richard-ramos richard-ramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just left few questions

Comment on lines 43 to 44
server.Router.HandleFunc("/protocol-stats", server.createProtocolStats).Methods("POST")
server.Router.HandleFunc("/received-messages", server.createReceivedMessages).Methods("POST")
server.Router.HandleFunc("/waku-metrics", server.createWakuTelemetry).Methods("POST")
server.Router.HandleFunc("/received-envelope", server.createReceivedEnvelope).Methods("POST")
server.Router.HandleFunc("/sent-envelope", server.createSentEnvelope).Methods("POST")
server.Router.HandleFunc("/update-envelope", server.updateEnvelope).Methods("POST")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning up these endpoints mean that this is a breaking change, since clients out there will no longer be able to push metrics. Do we want to allow backwards compatibility for telemetry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good question. I was not sure if they are still actively used. Happy to bring them back if you think it would break things

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure tbh.
I think it would depend whether these changes will be available in status release 2.30 or not, and if not, if we care about the telemetry data from 2.30 users

Comment on lines 12 to +13
type PeerCount struct {
ID int `json:"id"`
CreatedAt int64 `json:"createdAt"`
Timestamp int64 `json:"timestamp"`
NodeName string `json:"nodeName"`
NodeKeyUid string `json:"nodeKeyUid"`
PeerID string `json:"peerId"`
PeerCount int `json:"peerCount"`
StatusVersion string `json:"statusVersion"`
data types.PeerCount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a data attribute, wdyt about using

type PeerCount struct {
   types.PeerCount

And the same for PeerConnFailure:

type PeerConnFailure struct {
   types.PeerCount

I might be wrong but I think this would let you access the attributes directly, i.e:
myPeerConnFailure.Timestamp, instead of myPeerConnFailure.data.Timestamp

Copy link
Member Author

@vpavlin vpavlin Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, not the case, I'd have to implement the getters for the values - calling methods on composed types works, but to access the attributes, you need to do something like

type PeerCount struct {
	types.PeerCount
}

func (r *PeerCount) process(db *sql.DB, errs *MetricErrors, data *types.TelemetryRequest) error {
...
	lastInsertId := 0
	err = stmt.QueryRow(
		r.PeerCount.Timestamp,
...

So it is not really solving much:( The only benefit is that I can use json.Unmarshal(*data.TelemetryData, &r); instead of &r.data, but that seems to be the only benefit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

if err := json.Unmarshal(*data.TelemetryData, &stats); err != nil {
errorDetails = append(errorDetails, map[string]interface{}{"Id": data.Id, "Error": fmt.Sprintf("Error decoding protocol stats: %v", err)})
err := stats.process(s.DB, &errorDetails, &data)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log the err? Same question for lines 81 87 93 99 105 and 111, or is the log in line 120 enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also add logging the error to the errors.append method to make sure every new error is logged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried, this would result in telemetry logs like the following, and we could remove the log on line 120 to not duplicate

2024-08-08T10:28:43.752+0200	ERROR	telemetry/errors.go:28	Error saving peer count: pq: duplicate key value violates unique constraint "peercount_unique"
github.com/status-im/dev-telemetry/telemetry.(*MetricErrors).Append
	/home/vpavlin/devel/github.com/vpavlin/telemetry/telemetry/errors.go:28
github.com/status-im/dev-telemetry/telemetry.(*PeerCount).process
	/home/vpavlin/devel/github.com/vpavlin/telemetry/telemetry/peer_count.go:41
github.com/status-im/dev-telemetry/telemetry.(*Server).createTelemetryData
	/home/vpavlin/devel/github.com/vpavlin/telemetry/telemetry/server.go:99
net/http.HandlerFunc.ServeHTTP
	/usr/lib/go/src/net/http/server.go:2136
github.com/rs/cors.(*Cors).Handler-fm.(*Cors).Handler.func1
	/home/vpavlin/devel/pkg/mod/github.com/rs/[email protected]/cors.go:289
net/http.HandlerFunc.ServeHTTP
	/usr/lib/go/src/net/http/server.go:2136
github.com/status-im/dev-telemetry/telemetry.(*Server).rateLimit-fm.(*Server).rateLimit.func1
	/home/vpavlin/devel/github.com/vpavlin/telemetry/telemetry/server.go:222
net/http.HandlerFunc.ServeHTTP
	/usr/lib/go/src/net/http/server.go:2136
github.com/gorilla/mux.(*Router).ServeHTTP
	/home/vpavlin/devel/pkg/mod/github.com/gorilla/[email protected]/mux.go:210
net/http.serverHandler.ServeHTTP
	/usr/lib/go/src/net/http/server.go:2938
net/http.(*conn).serve
	/usr/lib/go/src/net/http/server.go:2009

if err != nil {
errs.Append(data.Id, fmt.Sprintf("Error saving peer count: %v", err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should return the details of the error to the user, or if instead we should log the err, and just return error saving peer count.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this depends on where is it better for us to see the errors: clients running status-go, the telemetry server, or both? I think we're most likely to look at the status-go logs from running status-desktop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is a tough question - returning the error leaks implementation details to clients - a) they usually don't need to know as they cannot do anything about it, b) it might be dangerous from the security perspective

That said, we'd need to make sure we have good access to Telemetry server logs otherwise to be able to spot the errors and reasons there.

Also, I would prefer not change it in this PR - how about a new issue "Review error handling and reporting in telemetry server and client"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not change it in this PR - how about a new issue "Review error handling and reporting in telemetry server and client"?

Sure. Can be done in a separate PR. Will open an issue so we dont forget

Comment on lines +192 to +189
peerIDHash := sha256.Sum256([]byte(protocolStats.data.PeerID))
protocolStats.data.PeerID = hex.EncodeToString(peerIDHash[:])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remembering past discussions, maybe we should just store the peerID without hashing it? wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to change it here to avoid bringing undrelated changes, but can do if it is pereferred:)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done either on a diff PR or in a diff commit! no preferences here either 🚀

@@ -99,7 +99,8 @@ func (rl *RateLimiter) cleanup(ctx context.Context, cleanupEvery time.Duration)
return
case now := <-t.C:
numCleaned := 0
for ip, limiter := range rl.values() {
limiters := rl.values()
Copy link
Member Author

@vpavlin vpavlin Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richard-ramos unrelated to the refactoring, but the CI failed, so I added the fix (I hope this should fix it?)

https://github.com/status-im/telemetry/actions/runs/10353387310/job/28656165988?pr=37

@vpavlin vpavlin merged commit 63f5395 into status-im:master Aug 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants