-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrape queue-proxy metrics in autoscaler #3149
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mdemirhan, yanweiguo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The test Since we do sampling and due to the reason mentioned in this comment, we could get metrics much lower than pod average level. This results in lower average observed revision concurrency and causes the test failure. So I increase the traffic sending window from 30 seconds to 35 seconds to reduce flakiness. |
Merged #3289. Thanks to #3289, the changes to @k4leung4 @markusthoemmes PTAL. |
/test pull-knative-serving-unit-tests |
/test pull-knative-serving-integration-tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss if we really need the activator alternative path.
Do we need to think about the backwards compatibility implications here? With the changes proposed by me above we could see double reporting (1. through scraping, 2. through metric sending) before apps are redeployed. Do we need to actively prevent that (for example by only accepting sent metrics from the activator) or do we rely on all applications being redeployed to not send stats anymore? |
@markusthoemmes We had some discussion about that above (#3149 (comment) if that link works) - we have a mechanism that redeploys user's apps as part of 0.3 so now that we have 0.4 cut a user has an intermediary version where we supported both systems. |
@greghaynes right, this is not necessarily about supporting both though but about having both of them interfere with each other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're getting there. I like the minimalized changes to the autoscaler itself a lot, thanks for doing that!
I've got a few comments throughout, I'm happy to help you get this in ASAP. 🎉
}, nil | ||
} | ||
|
||
// Scrape call the destination service then send it | ||
// to the given stats chanel | ||
func (s *ServiceScraper) Scrape(statsCh chan<- *StatMessage) { | ||
func (s *ServiceScraper) Scrape(ctx context.Context, statsCh chan<- *StatMessage) { | ||
logger := logging.FromContext(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should make Scrape
return an error
and log in the method calling it to prevent creating loggers (and passing context just to create loggers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this function as the same format of Scale
function. If we want to send no pods log for debugging, we can't return it as error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have logger as a ServiceScraper
member, fwiw.
/test pull-knative-serving-integration-tests |
The following is the coverage report on pkg/.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/gltm
}, nil | ||
} | ||
|
||
// Scrape call the destination service then send it | ||
// to the given stats chanel | ||
func (s *ServiceScraper) Scrape(statsCh chan<- *StatMessage) { | ||
func (s *ServiceScraper) Scrape(ctx context.Context, statsCh chan<- *StatMessage) { | ||
logger := logging.FromContext(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have logger as a ServiceScraper
member, fwiw.
meh, |
Fixes #2203.
Fixes #1927.
Proposed Changes
ServiceScraper
to scrape metrics from queue-proxy when aUniScaler
is created.ServiceScraper
to get ready pods count and estimate the average revision concurrency. Store this value inStat
as new fieldAverageRevConcurrency
.PodWeight
in autoscaling algorithm. This is proposed in Bucketize autoscaling metrics by timeframe not by pod name. #2977. Use same weight for all sample data.observedPods
. This information is useless when the autoscaling algorithm is based on sample data. We already remove the hard dependency in Use actual pods from K8S Informer for scaling rate #3055.Release Note