Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: concurrent worker agent processes #160

Merged

Conversation

jusiskin
Copy link
Contributor

@jusiskin jusiskin commented Oct 28, 2024

What was the problem/requirement? (What/Why)

The PosixInstanceWorkerBase class, which forms the base class of PosixInstanceBuildWorker had a bug causing multiple worker agent processes to be running simultaneously.

There was code in one location that was starting the systemd unit for the worker agent:

if config.start_service:
cmds.append("systemctl start deadline-worker")

and code in another place launching the worker agent in an SSM command:

f"nohup runuser --login {self.configuration.agent_user} -c 'AWS_DEFAULT_REGION={self.configuration.region} deadline-worker-agent > /tmp/worker-agent-stdout.txt 2>&1 &'",

What was the solution? (How)

Removed the code that launches the worker agent through an SSM command, since the officially supported and recommended way to run the worker agent is using systemd. We want tests to reflect this setup.

The way configuration is applied to the worker agent had to be changed. We were previously setting environment variables in the worker agent user's bash startup script, but this script is not loaded by systemd services since it does not use a shell. Instead we use a supplementary systemd drop-in config file (see https://wiki.archlinux.org/title/Systemd#Drop-in_files)

I've also modified the default for DeadlineWorkerConfiguration.start_service from falsetrue since most tests relied on the worker being started. Downstream consumers that relied on the worker agent being started by the SSM command will not need to modify their test code and this is the sensible default.

What is the impact of this change?

There will not be issues encountered in tests using this library where there are multiple worker agents running for the same worker ID, causing conflicts.

How was this change tested?

  1. Checked out the mainline branch of https://github.com/aws-deadline/deadline-cloud-worker-agent

  2. Created the hatch environments (e.g. ran hatch run fmt)

  3. Installed the modified version of dealdine-cloud-test-fixtures from this branch using:

    hatch shell
    pip install -e ../deadline-cloud-test-fixtures
    
  4. Ran the E2E tests using the instructions from DEVELOPMENT.md

Was this change documented?

No

Is this a breaking change?

Yes:

  1. The default of DeadlineWorkerConfiguration.start_service was changed from FalseTrue
  2. Downstream consumers of this package may have relied on the worker agent being started through the SSM command.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jusiskin jusiskin added the bug Something isn't working label Oct 28, 2024
@jusiskin jusiskin force-pushed the fix_concurrent_worker_agent_processes branch from 6a41adb to 44eceae Compare October 28, 2024 16:09
leongdl
leongdl previously approved these changes Oct 29, 2024
@jusiskin jusiskin force-pushed the fix_concurrent_worker_agent_processes branch from 44eceae to 55a5078 Compare October 30, 2024 17:42
@jusiskin jusiskin changed the title fix: concurrent worker agent processes fix!: concurrent worker agent processes Oct 30, 2024
@jusiskin jusiskin marked this pull request as ready for review October 30, 2024 17:56
@jusiskin jusiskin requested a review from a team as a code owner October 30, 2024 17:56
AWS-Samuel
AWS-Samuel previously approved these changes Oct 30, 2024
BREAKING CHANGE: the default worker config now starts the worker agent service

Signed-off-by: Josh Usiskin <[email protected]>
@jusiskin jusiskin force-pushed the fix_concurrent_worker_agent_processes branch from 0e5fa2a to 72bf9c3 Compare October 30, 2024 18:31
Copy link

sonarcloud bot commented Oct 30, 2024

@jusiskin jusiskin merged commit 9f7da7f into aws-deadline:mainline Oct 30, 2024
15 checks passed
@jusiskin jusiskin deleted the fix_concurrent_worker_agent_processes branch October 30, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants