-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: agent config precedence #8656
Conversation
✅ Deploy Preview for determined-ui canceled.
|
ba2f357
to
05af106
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good! Thanks for taking this on. Couple things worth bringing up, but seems very close.
agent/internal/options/options.go
Outdated
ConfigFile: "", | ||
Log: logger.Config{ | ||
Level: "trace", | ||
Color: true, | ||
}, | ||
MasterHost: "", | ||
MasterPort: 0, | ||
AgentID: "", | ||
Label: "", | ||
ResourcePool: "", | ||
ContainerMasterHost: "", | ||
ContainerMasterPort: 0, | ||
SlotType: "auto", | ||
VisibleGPUs: VisibleGPUsFromEnvironment(), | ||
Security: SecurityOptions{ | ||
TLS: TLSOptions{ | ||
Enabled: false, | ||
SkipVerify: false, | ||
MasterCert: "", | ||
MasterCertName: "", | ||
}, | ||
}, | ||
Debug: false, | ||
ArtificialSlots: 0, | ||
ImageRoot: "", | ||
TLS: false, | ||
TLSCertFile: "", | ||
TLSKeyFile: "", | ||
APIEnabled: false, | ||
BindIP: "0.0.0.0", | ||
BindPort: 9090, | ||
HTTPProxy: "", | ||
HTTPSProxy: "", | ||
FTPProxy: "", | ||
NoProxy: "", | ||
AgentReconnectAttempts: aproto.AgentReconnectAttempts, | ||
AgentReconnectBackoff: int(aproto.AgentReconnectBackoff / time.Second), | ||
ContainerRuntime: DockerContainerRuntime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to set fields to the "zero value" of their type, like ""
, false
, or 0
. Or is this an exhaustruct
linting thing?
ConfigFile: "", | |
Log: logger.Config{ | |
Level: "trace", | |
Color: true, | |
}, | |
MasterHost: "", | |
MasterPort: 0, | |
AgentID: "", | |
Label: "", | |
ResourcePool: "", | |
ContainerMasterHost: "", | |
ContainerMasterPort: 0, | |
SlotType: "auto", | |
VisibleGPUs: VisibleGPUsFromEnvironment(), | |
Security: SecurityOptions{ | |
TLS: TLSOptions{ | |
Enabled: false, | |
SkipVerify: false, | |
MasterCert: "", | |
MasterCertName: "", | |
}, | |
}, | |
Debug: false, | |
ArtificialSlots: 0, | |
ImageRoot: "", | |
TLS: false, | |
TLSCertFile: "", | |
TLSKeyFile: "", | |
APIEnabled: false, | |
BindIP: "0.0.0.0", | |
BindPort: 9090, | |
HTTPProxy: "", | |
HTTPSProxy: "", | |
FTPProxy: "", | |
NoProxy: "", | |
AgentReconnectAttempts: aproto.AgentReconnectAttempts, | |
AgentReconnectBackoff: int(aproto.AgentReconnectBackoff / time.Second), | |
ContainerRuntime: DockerContainerRuntime, | |
Log: logger.Config{ | |
Level: "trace", | |
Color: true, | |
}, | |
SlotType: "auto", | |
VisibleGPUs: VisibleGPUsFromEnvironment(), | |
BindIP: "0.0.0.0", | |
BindPort: 9090, | |
AgentReconnectAttempts: aproto.AgentReconnectAttempts, | |
AgentReconnectBackoff: int(aproto.AgentReconnectBackoff / time.Second), | |
ContainerRuntime: DockerContainerRuntime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly just hardcoded the default values for clarity, but changed this since it's a lot of redundant code.
agent/internal/options/options.go
Outdated
// ResolveHostname resolves the name (or ID) of the agent. | ||
func (o *Options) ResolveHostname() (*Options, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally have a preference for immutability as a default, so when a function/method is going to mutate things, I like to do my best to call that out in the function's name and doc comment.
What do you think about this for this function?
// SetAgentID resolves the name of name (or id) of the agent and saves it to o.
func (o *Options) SetAgentID() error {
then calling code can change to
err = opt.SetAgentID()
if err != nil {
rather than
opt, err = opt.ResolveHostname()
if err != nil {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, changed!
f30ad13
to
3c04ae9
Compare
3c04ae9
to
24928e5
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #8656 +/- ##
==========================================
+ Coverage 47.72% 47.78% +0.05%
==========================================
Files 1049 1050 +1
Lines 167292 167353 +61
Branches 2243 2241 -2
==========================================
+ Hits 79841 79964 +123
+ Misses 87293 87231 -62
Partials 158 158
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Description
This PR makes determined-agent initialization and start-up more similar to determined-master by incorporating viper into agent configuration setup to ensure the correct priority of agent configuration options (flag > environment > config > default).
It is also worth noting that the the
determined-agent
flagenable_api
was changed toapi_enabled
.Test Plan
cd
into localdetermined
repo and spin up a cluster with thedevcluster
tool. In the cluster logs in your terminal tab, ensure that theagent configuration
outputs shows"agent_reconnect_attempts":5
(provided that theagent_reconnect_attempts
field is not defined in theagent.config
section of your devcluster YAML, the JSON output should show the default value, which is currently 5).agent_reconnect_attempts: 9
to your devcluster config as follows:agent configuration
outputs shows"agent_reconnect_attempts":9
in your logs.2
in your terminal to kill the agent binary so that you are only runningDB
andMASTER
export DET_AGENT_RECONNECT_ATTEMPTS=10
cd
intodetermined/agent/build
and run./determined-agent --config-file /private/tmp/devcluster/agent.conf
agent configuration
outputs shows"agent_reconnect_attempts":10
(rather than the current default value of 5 or config value of 9)./determined-agent --config-file /private/tmp/devcluster/agent.conf --agent-reconnect-attempts 12
agent configuration
outputs shows"agent_reconnect_attempts":12
rather than the default value 5, config value 9, or the environment variable value 10.Checklist
docs/release-notes/
.See Release Note for details.
Ticket
DET-9800