Allow specifying arbitrary SSH configuration for nodes #1024

grahamc · 2018-10-21T02:12:40Z

I'd like to deploy to nodes only available behind a bastion server. This patch letsyou do this:

  my-node.deployment.sshConfigOptionsFile = ./ssh-config;

and ./ssh-config:

ProxyCommand ssh 10.5.3.111 -A -W %h:%p
ForwardAgent yes
LogLevel VERBOSE

Initial testing shows this works, I'll do further testing tomorrow.

grahamc · 2018-10-21T11:27:27Z

It took one more patch, but I can now provision and deploy nodes behind a bastion. Further testing required.

…rced reboots

grahamc · 2018-10-21T12:38:05Z

I can now force reboot nodes and have it wait properly for the machine to be up. A couple more TCP waiting patches to make.

coretemp · 2018-10-22T14:03:17Z

Are you sure that this also handles the ssh-for-each case? If so, please merge. If not, please try again :)

grahamc · 2018-12-10T17:59:38Z

$ nixops ssh-for-each hostname
mac7........> mac7
mac6........> mac6
mac5........> mac5
mac4........> mac4
mac3........> mac3
mac2........> mac2
mac1........> mac1
mac8........> mac8

grahamc · 2018-12-12T02:50:12Z

OK I think this is good to go:

$ nixops deploy --force-reboot --include mac1 
building all machine configurations...
mac1........> copying closure...
personal> closures copied successfully
mac1........> rebooting...
mac1........> waiting for the machine to finish rebooting...[down]channel 0: open failed: connect failed: Connection refused
stdio forwarding failed
ssh_exchange_identification: Connection closed by remote host

mac1........> could not connect to ‘[email protected]’, retrying in 1 seconds...
mac1........> activation finished successfully
personal> deployment finished successfully

aszlig

Changing the TCP ports check to use SSH instead is something of a mixed bag, so while this certainly makes handling of jump hosts easier, the timeout argument to run_command is only the connection timeout, which could possibly hang ssh on multiple occasions.

So I'd probably add a few tests for the none backend with flaky networking to see whether that really is robust enough. The only race condition I know for sure is when the machine reboots, but I remember seeing SSH hangs even during boot, eg. in something like this, which is a similar scenario.

nixops/ssh_util.py

aszlig · 2018-12-15T07:32:08Z

nixops/backends/hetzner.py

@@ -650,7 +651,9 @@ def _wait_stop(self):
        """
        self.log_start("waiting for system to shutdown... ")
        dotlog = lambda: self.log_continue(".")  # NOQA
-        wait_for_tcp_port(self.main_ipv4, 22, open=False, callback=dotlog)
+        while self.try_ssh():


This is a bit racy, because the timeout for run_command is only the SSH connect timeout, so if the machine is no longer reachable after the key exchange try_ssh could hang for minutes.

Btw. the same problem could occur in #857, hence cc @nh2.

If we'd switch to Python 3.x (I think it's long overdue), we could use subprocess.run with the timeout argument for logged_exec and in SSHMaster, but in the interim it's still possible in Python 3.4 and lower using wait and kill. We already have a timeout argument, so I'd either rename the argument or add another command_timeout argument. Note that we can't simply get rid of the connect timeout and just use the command timeout, because there might be long-running operations, like switching to a new configuration.

nixops/backends/hetzner.py

asymmetric · 2019-05-12T12:28:40Z

@grahamc are you still planning on working on this?

…rver too

grahamc · 2019-06-08T18:26:56Z

The commits since 0192a61 address almost all the feedback from @aszlig, but I'm not sure what to do about the timeout you raise.

Additionally, I no longer require this PR and am actually not even able to really test it as I don't have any bastion use cases.

dhess · 2019-06-08T18:36:59Z

I'll give this a try sometime in the next few weeks. I plan to put my new deployments behind a bastion host.

dhess · 2019-06-11T10:18:12Z

The bastion functionality works perfectly, so that's very nice, as I couldn't get this to work without the patch (NixOps hangs on waiting for SSH...).

However, I did have one case where nixops reboot did not detect that the host had come back up, so this seems to confirm @aszlig's comment about the potential race condition. (I have had at least one other reboot work fine, however.)

peterhoeg · 2019-09-03T07:58:46Z

I'd love this functionality as well. Good to merge with @dhess's blessing?

dhess · 2019-09-03T10:12:16Z

Unfortunately, I think it's not ready. nixops reboot frequently loses track of the rebooted host.

I may have time to look into a fix for this in the next few weeks.

grahamc · 2020-03-26T19:24:46Z

Hello!

Thank you for this PR.

In the past several months, some major changes have taken place in
NixOps:

Backends have been removed, preferring a plugin-based architecture.
Here are some of them:
NixOps Core has been updated to be Python 3 only, and at the
same time, MyPy type hints have been added and are now strictly
required during CI.

This is all accumulating in to what I hope will be a NixOps 2.0
release. There is a tracking issue for that:
#1242 . It is possible that
more core changes will be made to NixOps for this release, with a
focus on simplifying NixOps core and making it easier to use and work
on.

My hope is that by adding types and more thorough automated testing,
it will be easier for contributors to make improvements, and for
contributions like this one to merge in the future.

However, because of the major changes, it has become likely that this
PR cannot merge right now as it is. The backlog of now-unmergable PRs
makes it hard to see which ones are being kept up to date.

If you would like to see this merge, please bring it up to date with
master and reopen it. If the or mypy type checking fails, please
correct any issues and then reopen it. I will be looking primarily at
open PRs whose tests are all green.

Thank you again for the work you've done here, I am sorry to be
closing it now.

Graham

wmertens · 2020-07-06T15:49:33Z

😆 Fun to see the conversation between you and you @grahamc ;)

grahamc added 3 commits October 20, 2018 21:33

Support SSH configuration file

f0e042c

fixups

37eb32f

Wait for SSH instead of a TCP port

90617cd

Use wait_for_ssh for verifying ssh is back up, but not on down for fo…

e65f7c8

…rced reboots

AmineChikhaoui requested review from aszlig and removed request for aszlig December 11, 2018 14:58

Finish up patching away TCP checks, use SSH directly

0192a61

AmineChikhaoui requested a review from aszlig December 12, 2018 03:35

aszlig reviewed Dec 15, 2018

View reviewed changes

grahamc added 6 commits June 8, 2019 14:11

try_ssh: drop timeout option

d9c55c6

wait_for_ssh: clean up attempts, timeout

bb09606

Try_ssh: teach about timeouts

cdefcaa

wait_for_ssh: define in terms of try_ssh

bf1f0bf

Wait_for_ssh: make «up» configurable so it can wait for a down SSH se…

f241899

…rver too

hetzner: use wait_for_ssh instead of try_ssh in a loop

d5bb60c

ssh_util: Fixup python syntax

975f87f

grahamc requested a review from aszlig June 8, 2019 18:41

grahamc closed this Mar 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow specifying arbitrary SSH configuration for nodes #1024

Allow specifying arbitrary SSH configuration for nodes #1024

grahamc commented Oct 21, 2018

grahamc commented Oct 21, 2018

grahamc commented Oct 21, 2018

coretemp commented Oct 22, 2018

grahamc commented Dec 10, 2018

grahamc commented Dec 12, 2018

aszlig left a comment

aszlig Dec 15, 2018

aszlig Jun 8, 2019

asymmetric commented May 12, 2019

grahamc commented Jun 8, 2019

dhess commented Jun 8, 2019

dhess commented Jun 11, 2019

peterhoeg commented Sep 3, 2019

dhess commented Sep 3, 2019

grahamc commented Mar 26, 2020

wmertens commented Jul 6, 2020

Allow specifying arbitrary SSH configuration for nodes #1024

Allow specifying arbitrary SSH configuration for nodes #1024

Conversation

grahamc commented Oct 21, 2018

grahamc commented Oct 21, 2018

grahamc commented Oct 21, 2018

coretemp commented Oct 22, 2018

grahamc commented Dec 10, 2018

grahamc commented Dec 12, 2018

aszlig left a comment

Choose a reason for hiding this comment

aszlig Dec 15, 2018

Choose a reason for hiding this comment

aszlig Jun 8, 2019

Choose a reason for hiding this comment

asymmetric commented May 12, 2019

grahamc commented Jun 8, 2019

dhess commented Jun 8, 2019

dhess commented Jun 11, 2019

peterhoeg commented Sep 3, 2019

dhess commented Sep 3, 2019

grahamc commented Mar 26, 2020

wmertens commented Jul 6, 2020