Skip to content

Commit

Permalink
Fix worker node servers getting killed after JuptyerHub restart
Browse files Browse the repository at this point in the history
Follow-up to nebari-dev#106 and fixes nebari-dev#104 (again)

We discovered in the JupyterHub logs that it was trying to contact the
master node for jobs scheduled on worker nodes which was incorrect and
led to them getting killed:

```
Notebook server job 157 started at hpc-worker-02:52649
(JupyterHub restart)
server never showed up at http://hpc-master-node:52649
```

This fixes the problem by preserving `self.server.ip` similar to
`self.server.port` in `QHubHPCSpawnerBase.poll()`.
  • Loading branch information
ericdwang committed Apr 11, 2022
1 parent a8e9408 commit b8fad00
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions roles/jupyterhub/templates/jupyterhub_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,11 +120,11 @@ async def authenticate(self, handler, data):

class QHubHPCSpawnerBase(SlurmSpawner):
async def poll(self):
# on server restart the port appears to change when poll() is called
# on the server.port object. This shim ensures that port is preserved
port = self.server.port
# on server restart the IP and port appears to change when poll() is called
# on the server object. This shim ensures that those are preserved
ip, port = self.server.ip, self.server.port
value = await super().poll()
self.server.port = port
self.server.ip, self.server.port = ip, port
return value

req_conda_environment_prefix = Unicode('',
Expand Down

0 comments on commit b8fad00

Please sign in to comment.