-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take into account queue length in autoscaling #5684
Conversation
@@ -357,6 +360,7 @@ def run(self): | |||
try: | |||
self._run() | |||
except Exception: | |||
logger.exception("Error in monitor loop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't do this, the error message is lost forever trying to terminate nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean nodes probably weren't cleaned up? If so, would be good to print that in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's done below actually
3422e4a
to
512d70c
Compare
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -142,45 +142,56 @@ def terminate_node(self, node_id): | |||
class LoadMetricsTest(unittest.TestCase): | |||
def testUpdate(self): | |||
lm = LoadMetrics() | |||
lm.update("1.1.1.1", {"CPU": 2}, {"CPU": 1}) | |||
lm.update("1.1.1.1", {"CPU": 2}, {"CPU": 1}, {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some test cases for the new metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there are a couple entries below in testLoadMessages
@@ -357,6 +360,7 @@ def run(self): | |||
try: | |||
self._run() | |||
except Exception: | |||
logger.exception("Error in monitor loop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean nodes probably weren't cleaned up? If so, would be good to print that in the error message.
@@ -357,6 +360,7 @@ def run(self): | |||
try: | |||
self._run() | |||
except Exception: | |||
logger.exception("Error in monitor loop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.exception("Error in monitor loop") | |
logger.exception("Error in monitor loop.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe in periods in log messages
Merging so I can test this for real on compiled wheels -- had some issues trying to set it up before. |
Test FAILed. |
I just tested this on a real cluster and it works as expected. |
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.