-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow user configurable agent timeout for crossbar disconnection #337
Conversation
* Delay reconnection check attempts in same manner as twisted reconnection * Define signal handlers on disconnect. This allows interrupting agents that are running with a crossbar-timeout of 0.
I can't reproduce this failing test locally. Will need to investigate a bit, but might not get to it immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked well for me, in basic testing (incl with Hostmanager)... couple of discussion points.
As a result, yield also when calling _shutdown().
I think I addressed everything. Also merged in the latest main, I want to see if that failing test persists, then need to work on that if it does before asking for re-review. |
Now that we're properly waiting for Agent shutdown it takes a bit longer. These times were sufficient when running tests locally.
[skip ci]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Minor inline comment.
Description
This PR introduces a per Agent configurable agent time. This allows the user to set how long after the Agent loses connection to the crossbar server the Agent waits before cleaning up and shutting down.
A summary of changes:
Usage
This timeout is available as a commandline argument
--crossbar-timeout
, which can also be set in the SCF in an individual instance's 'arguments' list, for example:And example on the command line:
Or in a docker-compose file:
A crossbar-timeout of 0 is special and will disable the timeout, allowing the Agent to run forever through a crossbar outage.
Logging
Upon disconnecting the Agent logs will display (note there are some debug statements active here indicating the top of the acquisition loop in the fake data agent):
Questions/Self Comments
I'm not sure the best place to document this. It's in the argparse help for the site config arguments and in ocs-agent-cli, maybe that's sufficient? Feedback welcome here.
I've kept the default to 10 seconds, since that's what we've had. Do want to keep this or change it?
Motivation and Context
Resolves #331.
Ever since #180 this timeout has been 10 seconds. But we need the ability to have Agents continue to run indefinitely while the crossbar server might be down, this allows agents like the HWP agents in socs to continue to command the hardware, even if the Agent can't communicate with the rest of the network.
How Has This Been Tested?
I've tested this locally with a collection of fake data agents.
Types of changes
Checklist: