Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improved poller algorithm #2622

Merged
merged 10 commits into from
Jun 1, 2023
Merged

Conversation

mathnogueira
Copy link
Member

@mathnogueira mathnogueira commented May 31, 2023

This PR introduces another algorithm parameter for our trace poller. When we don't detect changes in the trace anymore, we try matching the selectors with spans. If all selectors get at least one span, we consider the polling successful, otherwise, try again in the next polling cycle. If we have no changes in the trace and we cannot match the selectors with spans from the trace 3 times in a row, we end the polling and mark it as complete (to prevent issues when writing TDD)

There are more things that could be done and I'll do in the future if needed:

  • Make this configurable via a polling profile option (either another strategy, or just an option inside the periodic object)

Example

# Run Action
# 01 Got initial trace
# 02 Got more spans, continue
# 03 Got more spans, continue
# 04 No more spans, but not all selectors are working
# 05 No more spans, but not all selectors are working
# 06 No more spans, but not all selectors are working. But we tried 3 times already, so finish polling

## Checklist

- [x] tested locally
- [ ] added new dependencies
- [ ] updated the docs
- [x] added a test

Comment on lines -75 to -79
ctx context.Context
test model.Test
run model.Run
count int
hadRequeue bool
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind changing this object is to make it extendable via headers (just like network protocols work by attaching headers to the packet). Thus, I removed the context which can be serialized via context propagation and the hadRequeue.

This way, we can have the counter for the selector based polling executor without changing this object and impacting other strategies..

Copy link
Collaborator

@schoren schoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and I think it's going to have a great impact on the more flaky tests. Left a few suggestions, nothing to block a merge

)

const (
selectorBasedPollerExecutorRetryHeader = "SelectorBasedPollerExecutor::retryCount"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I feel like it would be more consistent to use . instead of :: to split the "namespace". Did you choose this for any particular reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I see namespace I remember C++ and used it's ugly convention 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, do we have/want event logs for the new polling strategy? so we can see the failed selectors from the client side

added some logs to it! Good catch


currentNumberTries := pe.getNumberTries(request)
if currentNumberTries >= selectorBasedPollerExecutorMaxTries {
return true, "not all selectors matched, but trace haven't changed in a while", run, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the end user, at some point, have a clear undestanding of what in a while means? Like, is the amount of retries related to this error at some point? If not, maybe add the amount of retries/time in this message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, oscar got a point: this message should be an event. This string is only shown if the polling is not complete (first parameter is false) so it's useless in this context)

return true, "all selectors have matched one or more spans", run, err
}

request.SetHeader(selectorBasedPollerExecutorRetryHeader, fmt.Sprintf("%d", currentNumberTries+1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file already has the strconv package imported, might look more consistent use it here

Suggested change
request.SetHeader(selectorBasedPollerExecutorRetryHeader, fmt.Sprintf("%d", currentNumberTries+1))
request.SetHeader(selectorBasedPollerExecutorRetryHeader, strconv.Itoa(currentNumberTries+1))

Another approach could be to add a few request.SetHeaderInt(key string, val int), request.GetHeaderInt(key string) int kind of methods to the request, to reduce the boilerplate on the consumer side, and further abstract them from the internal implementation

}

func (pr PollingRequest) IsFirstRequest() bool {
return !pr.hadRequeue
return pr.Header("requeued") != "true"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another point where a return !pr.GetBool("requeued") would look cool

@@ -189,7 +194,7 @@ func (pe DefaultPollerExecutor) donePollingTraces(job *PollingRequest, traceDB t
if !traceDB.ShouldRetry() {
return true, "TraceDB is not retryable"
}
pp := *pe.ppGetter.GetDefault(job.ctx).Periodic
pp := *pe.ppGetter.GetDefault(job.Context()).Periodic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you're here, can you add a nil check for Peridiodic? It should never be nil, but it's better to be safe than panic

@xoscar
Copy link
Collaborator

xoscar commented May 31, 2023

Question, do we have/want event logs for the new polling strategy? so we can see the failed selectors from the client side

@mathnogueira mathnogueira force-pushed the feat/improved-poller-algorithm branch from d7547fe to e621a3a Compare June 1, 2023 16:48
@mathnogueira mathnogueira merged commit 163fe30 into main Jun 1, 2023
@mathnogueira mathnogueira deleted the feat/improved-poller-algorithm branch June 1, 2023 17:30
schoren pushed a commit that referenced this pull request Jun 5, 2023
* simplify polling request object and use mapcarrier to propagate ctx

* add another condition to check for selectors after all spans are ready

* fix test

* add option to configure max retries in selector based trace poller

* move selector based poller initialization to app.go

* PR suggestions

* fix tests and move polling success event to trace poller component

* add events to new selector poller executor

* fix pooling profile test

* fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants