Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ansible config for dev desktops #79

Merged
merged 61 commits into from
May 21, 2022

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Oct 14, 2021

@Mark-Simulacrum
Copy link
Member

It probably makes sense to just inline all the files into simpleinfra rather than checking out a separate repository -- we'd want to move that into rust-lang at least, and it doesn't seem like there's much benefit to keeping it separate rather than integrated into this repository.

@pietroalbini
Copy link
Member

Hmm, is there a reason why the tool to create the users is a full Rust program depending on reqwest? A simple bash script with jq to parse the JSON will likely be enough.

@oli-obk
Copy link
Contributor Author

oli-obk commented Oct 26, 2021

Hmm, is there a reason why the tool to create the users is a full Rust program depending on reqwest? A simple bash script with jq to parse the JSON will likely be enough.

It started out as a tool that was run on each login and that took the username as an argument. The first few bash iterations of this got "hacked" by ppl I showed it to, just by looking at it. So I made it a Rust program where I was able to reason about things. Even reading articles about making bash scripts secure, I was not certain I could create one.

Later when the script became a cron job, this was much less important, the only insecure input is the json dump from github, and if that is taken over we have lost anyway.

So... in essence, yes we could go back to a bash script using jq, it's just not straight forward to understand what is going on in it (to me, even though I'm the author).

@Mark-Simulacrum
Copy link
Member

Pushed up a few commits with fixes/improvements while experimenting with the sample machine. One likely blocker I noted is removing the global Rust installation, which conflicts with usage of rustup (or at least rustup complains about it).

@Mark-Simulacrum
Copy link
Member

TASK [dev-desktop : Build team login cron job] **********************************************************************
fatal: [dev-desktop.infra.rust-lang.org]: FAILED! => {"changed": true, "cmd": "cd /root/team_login && cargo build", "delta": "0:00:00.002152", "end": "2022-03-16 15:04:41.149915", "msg": "non-zero return code", "rc": 127, "start": "2022-03-16 15:04:41.147763", "stderr": "/bin/sh: 1: cargo: not found", "stderr_lines": ["/bin/sh: 1: cargo: not found"], "stdout": "", "stdout_lines": []}

It looks like we're not finding the rustup install -- maybe an explicit PATH is needed or something, not sure.

@Mark-Simulacrum
Copy link
Member

That did not seem to help; I'm still seeing the same error after pulling the latest changes.

@Mark-Simulacrum
Copy link
Member

TASK [dev-desktop : Build team login cron job] ***************************************************************************************************************************************************************
fatal: [dev-desktop.infra.rust-lang.org]: FAILED! => {"changed": true, "cmd": "cd /root/team_login && PATH=$PATH:$HOME/.cargo/bin cargo build", "delta": "0:00:26.167428", "end": "2022-03-16 17:03:56.949407", "msg": "non-zero return code", "rc": 101, "start": "2022-03-16 17:03:30.781979", "stderr": "    Updating crates.io index\n Downloading crates ...\n  Downloaded openssl-sys v0.9.67\n  Downloaded autocfg v1.0.1\n  Downloaded unicode-xid v0.2.2\n  Downloaded proc-macro2 v1.0.29\n  Downloaded pkg-config v0.3.20\n  Downloaded socket2 v0.4.2\n  Downloaded ryu v1.0.5\n  Downloaded openssl-probe v0.1.4\n  Downloaded itoa v0.4.8\n  Downloaded quote v1.0.9\n  Downloaded syn v1.0.77\n  Downloaded curl v0.4.41\n  Downloaded libc v0.2.103\n  Downloaded cc v1.0.70\n  Downloaded libz-sys v1.1.3\n  Downloaded curl-sys v0.4.51+curl-7.80.0\n  Downloaded mini-internal v0.1.16\n  Downloaded miniserde v0.1.16\n   Compiling pkg-config v0.3.20\n   Compiling cc v1.0.70\n   Compiling libc v0.2.103\n   Compiling autocfg v1.0.1\n   Compiling proc-macro2 v1.0.29\n   Compiling unicode-xid v0.2.2\n   Compiling syn v1.0.77\n   Compiling ryu v1.0.5\n   Compiling curl v0.4.41\n   Compiling itoa v0.4.8\n   Compiling openssl-probe v0.1.4\nerror: linker `cc` not found\n  |\n  = note: No such file or directory (os error 2)\n\nerror: could not compile `curl` due to previous error\nwarning: build failed, waiting for other jobs to finish...\nerror: build failed", "stderr_lines": ["    Updating crates.io index", " Downloading crates ...", "  Downloaded openssl-sys v0.9.67", "  Downloaded autocfg v1.0.1", "  Downloaded unicode-xid v0.2.2", "  Downloaded proc-macro2 v1.0.29", "  Downloaded pkg-config v0.3.20", "  Downloaded socket2 v0.4.2", "  Downloaded ryu v1.0.5", "  Downloaded openssl-probe v0.1.4", "  Downloaded itoa v0.4.8", "  Downloaded quote v1.0.9", "  Downloaded syn v1.0.77", "  Downloaded curl v0.4.41", "  Downloaded libc v0.2.103", "  Downloaded cc v1.0.70", "  Downloaded libz-sys v1.1.3", "  Downloaded curl-sys v0.4.51+curl-7.80.0", "  Downloaded mini-internal v0.1.16", "  Downloaded miniserde v0.1.16", "   Compiling pkg-config v0.3.20", "   Compiling cc v1.0.70", "   Compiling libc v0.2.103", "   Compiling autocfg v1.0.1", "   Compiling proc-macro2 v1.0.29", "   Compiling unicode-xid v0.2.2", "   Compiling syn v1.0.77", "   Compiling ryu v1.0.5", "   Compiling curl v0.4.41", "   Compiling itoa v0.4.8", "   Compiling openssl-probe v0.1.4", "error: linker `cc` not found", "  |", "  = note: No such file or directory (os error 2)", "", "error: could not compile `curl` due to previous error", "warning: build failed, waiting for other jobs to finish...", "error: build failed"], "stdout": "", "stdout_lines": []}

Looks like this needs curl installed externally.

@Mark-Simulacrum
Copy link
Member

Couple notes:

  • It looks like the message of the day doesn't quite work, or at least logging in myself I'm not seeing it yet
  • Install the team login binary when run through ansible after the initial deploy can fail with "cannot create regular file" if the binary is currently running. Ideally we can seamlessly redeploy as needed without impacting existing users, so this seems like it's good to fix. (Particularly if we e.g. modify the source code of the team login script).
  • the "common : remove old group memberships" task in Ansible will drop allow-ssh from a bunch of users that get created by the cron job - I'm not sure what the right fix here is, but presumably this means that deploying ansible will erase a bunch of permissions, which seems not great. (They will get restored in ~5 minutes, but still.)

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 16, 2022

  • It looks like the message of the day doesn't quite work, or at least logging in myself I'm not seeing it yet

Yea I was worried about that. The default sshd config disables motd and I'm not sure if reenabling it actually works in this setup. We could change the default sshd config, but that would change it for all servers.

  • Install the team login binary when run through ansible after the initial deploy can fail with "cannot create regular file" if the binary is currently running.

Huh? I thought linux can overwrite binaries that are running and that there are no locks on them? I'll look into that. Maybe we just spawn a script that keeps trying to copy the new binary until that succeeds.

  • the "common : remove old group memberships" task in Ansible will drop allow-ssh from a bunch of users that get created by the cron job -

Launch the cron script once manually right after deploy?

@Mark-Simulacrum
Copy link
Member

Yea I was worried about that. The default sshd config disables motd and I'm not sure if reenabling it actually works in this setup. We could change the default sshd config, but that would change it for all servers.

I do see some message of the day, I think, but seems like something to fix. I don't think the particular fix is too important -- if we need to enable messages of the day across all servers, that seems OK, for example.

Huh? I thought linux can overwrite binaries that are running and that there are no locks on them? I'll look into that. Maybe we just spawn a script that keeps trying to copy the new binary until that succeeds.

fatal: [dev-desktop.infra.rust-lang.org]: FAILED! => {"changed": true, "cmd": "cp /root/team_login/target/debug/team_login /etc/cron.team_login", "delta": "0:00:00.002736", "end": "2022-03-16 19:55:11.805673", "msg": "non-zero return code", "rc": 1, "start": "2022-03-16 19:55:11.802937", "stderr": "cp: cannot create regular file '/etc/cron.team_login': Text file busy", "stderr_lines": ["cp: cannot create regular file '/etc/cron.team_login': Text file busy"], "stdout": "", "stdout_lines": []}

Launch the cron script once manually right after deploy?

I think we probably want to not run the drop groups task for this server, and potentially that means it needs a different common base (or parts of the common base need factoring out); it's our first server with a non-empty set of additional users that aren't just the infra admins and aren't created by Ansible.

(An alternative approach could be to move the user creation out of the server cron job and into Ansible, and then add logic to e.g. rust-lang/sync-team to run Ansible to deploy changes across servers after team repo updates. But that is a larger change, for sure.)

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 17, 2022

  • It looks like the message of the day doesn't quite work, or at least logging in myself I'm not seeing it yet

check if it works now. According to the update-motd docs there is a cronjob updating it every 10 minutes

@Mark-Simulacrum
Copy link
Member

TASK [dev-desktop : Set up the team login cron job] 
fatal: [dev-desktop.infra.rust-lang.org]: FAILED! => {"msg": "The conditional check 'task_result.rc == 0' failed. The error was: error while evaluating conditional (task_result.rc == 0): 'dict object' has no attribute 'rc'"}

@Mark-Simulacrum
Copy link
Member

OK, that seemed to work in terms of deploying. I'm not yet seeing the message of the day this is trying to configure though:

Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-1017-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Thu Mar 17 16:51:48 UTC 2022

  System load:  0.05               Processes:             364
  Usage of /:   2.2% of 193.82GB   Users logged in:       0
  Memory usage: 1%                 IPv4 address for ens5: 10.0.0.85
  Swap usage:   0%

 * Ubuntu Pro delivers the most comprehensive open source security and
   compliance features.

   https://ubuntu.com/aws/pro

21 updates can be applied immediately.
3 of these updates are standard security updates.
To see these additional updates run: apt list --upgradable


*** System restart required ***
Last login: Wed Mar 16 19:54:09 2022 from ...

@Mark-Simulacrum
Copy link
Member

No obvious effect. FWIW, it seems like a good idea to replace the default MOTD text with the one we are trying to provide, both to make it more prominent and since I think the default isn't particularly useful.

@Mark-Simulacrum
Copy link
Member

No more motd showing at all. FWIW, you should have ssh access, so you can probably try to make it work on the machine directly and then reproduce whatever steps into Ansible.

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 17, 2022

could you remove the IP address lockdown? That would make it easier and we'll soon have to do it anyway

@Mark-Simulacrum
Copy link
Member

I don't think it's currently in place already.

@Mark-Simulacrum
Copy link
Member

OK, that seems to have worked:

Continued use of this server implies that you accept the rules.
You can find the rules at
https://foundation.rust-lang.org/policies/cloud-compute-program/
Last login: Thu Mar 17 17:24:33 2022 from ...
simulacrum@dev-desktop:~$

I think this is nearly ready to go then, probably we just need to button up the policy and confirm with Joel that this will work as the only/primary prompt users get.

@pietroalbini might also want to take a look at the Terraform and Ansible configurations, but I figure in the alpha/beta period with just a few testers the current machine should likely be OK.

@pietroalbini
Copy link
Member

I'll take a look in the coming days.

@Mark-Simulacrum
Copy link
Member

We are getting an alert from our monitoring infra:

motd-news.service on dev-desktop failed

Can we disable that systemd service or otherwise make it not actually fail to run?

Copy link
Member

@pietroalbini pietroalbini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

In general, I'd still prefer for the sync script not to be in Rust but to be in either bash or even better python: it'd remove the need to manage the rustup installation on the root user, the problem with replacing binaries, and it'd make on the fly changes easier.

If you don't have the time to do it I can push a commit changing the script over to Python.

ansible/roles/common/tasks/users.yml Outdated Show resolved Hide resolved
ansible/roles/dev-desktop/files/team_login/src/main.rs Outdated Show resolved Hide resolved
ansible/roles/dev-desktop/tasks/main.yml Show resolved Hide resolved
terraform/dev-desktops/instance.tf Show resolved Hide resolved
ansible/roles/dev-desktop/files/team_login/src/main.rs Outdated Show resolved Hide resolved
ansible/roles/dev-desktop/files/team_login/src/main.rs Outdated Show resolved Hide resolved
@pietroalbini
Copy link
Member

Also, how do we expect users to authenticate with GitHub to push code? I might've missed that discussion.

@pietroalbini
Copy link
Member

Merging this, we can iterate with other PRs.

@pietroalbini pietroalbini merged commit 3006cc7 into rust-lang:master May 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants