-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improved poller algorithm #2622
Conversation
ctx context.Context | ||
test model.Test | ||
run model.Run | ||
count int | ||
hadRequeue bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind changing this object is to make it extendable via headers (just like network protocols work by attaching headers to the packet). Thus, I removed the context
which can be serialized via context propagation and the hadRequeue
.
This way, we can have the counter for the selector based polling executor without changing this object and impacting other strategies..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and I think it's going to have a great impact on the more flaky tests. Left a few suggestions, nothing to block a merge
) | ||
|
||
const ( | ||
selectorBasedPollerExecutorRetryHeader = "SelectorBasedPollerExecutor::retryCount" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I feel like it would be more consistent to use .
instead of ::
to split the "namespace". Did you choose this for any particular reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I see namespace
I remember C++ and used it's ugly convention 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, do we have/want event logs for the new polling strategy? so we can see the failed selectors from the client side
added some logs to it! Good catch
|
||
currentNumberTries := pe.getNumberTries(request) | ||
if currentNumberTries >= selectorBasedPollerExecutorMaxTries { | ||
return true, "not all selectors matched, but trace haven't changed in a while", run, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would the end user, at some point, have a clear undestanding of what in a while
means? Like, is the amount of retries related to this error at some point? If not, maybe add the amount of retries/time in this message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, oscar got a point: this message should be an event. This string is only shown if the polling is not complete (first parameter is false) so it's useless in this context)
return true, "all selectors have matched one or more spans", run, err | ||
} | ||
|
||
request.SetHeader(selectorBasedPollerExecutorRetryHeader, fmt.Sprintf("%d", currentNumberTries+1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this file already has the strconv
package imported, might look more consistent use it here
request.SetHeader(selectorBasedPollerExecutorRetryHeader, fmt.Sprintf("%d", currentNumberTries+1)) | |
request.SetHeader(selectorBasedPollerExecutorRetryHeader, strconv.Itoa(currentNumberTries+1)) |
Another approach could be to add a few request.SetHeaderInt(key string, val int)
, request.GetHeaderInt(key string) int
kind of methods to the request, to reduce the boilerplate on the consumer side, and further abstract them from the internal implementation
server/executor/trace_poller.go
Outdated
} | ||
|
||
func (pr PollingRequest) IsFirstRequest() bool { | ||
return !pr.hadRequeue | ||
return pr.Header("requeued") != "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another point where a return !pr.GetBool("requeued")
would look cool
@@ -189,7 +194,7 @@ func (pe DefaultPollerExecutor) donePollingTraces(job *PollingRequest, traceDB t | |||
if !traceDB.ShouldRetry() { | |||
return true, "TraceDB is not retryable" | |||
} | |||
pp := *pe.ppGetter.GetDefault(job.ctx).Periodic | |||
pp := *pe.ppGetter.GetDefault(job.Context()).Periodic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you're here, can you add a nil
check for Peridiodic
? It should never be nil, but it's better to be safe than panic
Question, do we have/want event logs for the new polling strategy? so we can see the failed selectors from the client side |
d7547fe
to
e621a3a
Compare
* simplify polling request object and use mapcarrier to propagate ctx * add another condition to check for selectors after all spans are ready * fix test * add option to configure max retries in selector based trace poller * move selector based poller initialization to app.go * PR suggestions * fix tests and move polling success event to trace poller component * add events to new selector poller executor * fix pooling profile test * fix tests
This PR introduces another algorithm parameter for our trace poller. When we don't detect changes in the trace anymore, we try matching the selectors with spans. If all selectors get at least one span, we consider the polling successful, otherwise, try again in the next polling cycle. If we have no changes in the trace and we cannot match the selectors with spans from the trace 3 times in a row, we end the polling and mark it as complete (to prevent issues when writing TDD)
There are more things that could be done and I'll do in the future if needed:
periodic
object)Example