Skip to content

Commit

Permalink
fix(connector-corda): contract deployment SSH reconnect race condition
Browse files Browse the repository at this point in the history
When deploying multiple contracts on multiple nodes, there are multiple
SSH connections being established.
When this all happens on the same host (because you are running the
all-in-one image for example) then the closing and opening of the same
SSH port is not instant between disconnect and connect operations of the
SSH client and connectivity problems come up.

Due to lack of time I quickly fixed this by adding a 5 second wait
between the disconnect and connect operations. Retries would be a much
better solution long term especially since race conditions can never
truly be fixed with hardcoded wait times that will sooner or later become
too short or too long depending on the exact nature of the problem.

[skip ci]

Signed-off-by: Peter Somogyvari <[email protected]>
  • Loading branch information
petermetz committed Nov 20, 2023
1 parent 0085f34 commit 0af2eb1
Showing 1 changed file with 5 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,11 @@ class ApiPluginLedgerConnectorCordaServiceImpl(
try {
ssh.disconnect()
logger.debug("Disconnected OK from SSH host ${cred.hostname}:${cred.port}")
// This is a hack to force the code to wait for the OS to close down the port. Without it, deployments
// can fail intermittently because it'll try to reconnect too fast. The right way to fix it is
// to make it probe the port openness and have retries with exponential backoff.
// TODO: Implement the proper fix as described above.
Thread.sleep(5000)
} catch (ex: Exception) {
logger.warn("Disconnect failed from SSH host ${cred.hostname}:${cred.port}. Ignoring since we are done anyway...")
}
Expand Down

0 comments on commit 0af2eb1

Please sign in to comment.