JupyterLab servers are being killed when jupyterhub is updated #104

costrouc · 2022-03-02T17:17:41Z

Jupyter notebook servers are killed when restarting JupyterHub

costrouc · 2022-03-02T22:07:43Z

Here is where I think I set it up properly https://github.com/Quansight/qhub-hpc/blob/main/roles/jupyterhub/templates/jupyterhub_config.py#L37-L38.

How would you recreate this:

login to jupyterhub and start a jupyterlab session and do some calcs
login to the root node and run systemctl restart jupyterhub this should kill the lab sessions
other approach is to rerun the ansible playbook and do something that restart the jupyterhub server.

costrouc · 2022-03-02T22:09:31Z

How to run qhub-hpc https://github.com/Quansight/qhub-hpc/blob/main/docs/installation.md

Adam-D-Lewis · 2022-03-04T22:52:51Z

The fix mentioned jupyterhub/jupyterhub#1156 and https://github.com/jupyterhub/jupyterhub/wiki/Run-jupyterhub-as-a-system-service/2b83a97882063c13456f09e81b7c5f7302ba5d33 worked. I added KillMode=process to the service file, and redeployed, but I also needed to manually run sudo systemctl daemon-reload and sudo systemctl restart jupyterhub

Adam-D-Lewis · 2022-03-11T00:25:56Z

I thought I needed to setup c.JupyterHub.cleanup_proxy = False as well, but it turns out that I just wasn't waiting long enough before the server was killed. The user server is still killed after 40-45 seconds.

It's also worth noting that if you run systemcl stop jupyterhub the user server is not killed (unless you subsequently start the jupyterhub process)

Adam-D-Lewis · 2022-03-11T17:28:54Z

Okay, so what I've seen is the proxy needs to stay up or the user sessions die. Setting c.JupyterHub.cleanup_proxy = False will keep the proxy up when running systemcl stop jupyterhub, but when jupyterhub is restarted, it checks if an existing proxy is still up, and kills it if so so the user sessions are still being killed. The solution is to run the jupyterhub proxy externally. There are two options to do so.

Switch to TraefikTomlProxy and do so (https://jupyterhub-traefik-proxy.readthedocs.io/en/latest/toml.html#example-setup)
Just keep configurable-http-proxy (default proxy), but set it up as it's own systemd service and configure jupyterhub to not start a proxy itself (https://github.com/jupyterhub/configurable-http-proxy)

costrouc · 2022-03-11T18:10:36Z

So talked with @Adam-D-Lewis about this issue and the problem is that the proxy is currently running as a subprocess of jupyterhub. So whenever the hub goes down so does the http connections which then leads the jupyterlab server killing themselves. The proper way to do this is to ensure that the proxy is running as a separate managed service in systemd.

There are two routes to solve this:

Use configurable-http-proxy

Short term would be to use configurable-http-proxy (the standard way this has been done) see https://github.com/jupyterhub/the-littlest-jupyterhub/tree/125bd1dc186d541585426f7ebf041dd9abad1845/tljh/systemd-units.

A few of the steps needed:

An example systemd service for the proxy https://github.com/jupyterhub/the-littlest-jupyterhub/blob/125bd1dc186d541585426f7ebf041dd9abad1845/tljh/systemd-units/configurable-http-proxy.service
Shows how the proxy api token needs to be shared https://github.com/jupyterhub/the-littlest-jupyterhub/blob/125bd1dc186d541585426f7ebf041dd9abad1845/tljh/systemd-units/jupyterhub.service#L20
Shows the jupyterhub_config.py settings required https://github.com/jupyterhub/the-littlest-jupyterhub/blob/125bd1dc186d541585426f7ebf041dd9abad1845/tljh/jupyterhub_config.py#L46-L47
The proxy is installable via conda-forge https://anaconda.org/conda-forge/configurable-http-proxy

I would estimate that this is 10 hours of work.

Traefik v2 integration

There are so many clients and projects where we have seen a need for traefik v2 support for jupyterhub. And it is a popular issue jupyterhub/traefik-proxy#97. There are many partially completed PRs and it needs someone to push it over the line.

This is holding back several open source projects: the littlest jupyterhub, zero to jupyterhub, qhub, and qhub-hpc (now).

sjdemartini · 2022-03-29T19:38:42Z

This issue unfortunately needs to be re-opened. Jobs running on workers are getting killed every time JupyterHub restarts, while jobs running on the master don't get killed. The above testing seemingly only used a master-node setup. I'd consider this high priority. Thanks in advance for investigating.

sjdemartini · 2022-03-29T19:43:44Z

For what it's worth, the error message is showing:

[email protected]'s server never showed up at http://hpc-master-node:42479/user/[email protected]/test2/ after 30 seconds. Giving up.

It seems fishy that it's looking at hpc-master-node instead of the worker node name, but I don't know if that's expected.

Follow-up to nebari-dev#106 and fixes nebari-dev#104 (again) We discovered in the JupyterHub logs that it was trying to contact the master node for jobs scheduled on worker nodes which was incorrect and led to them getting killed: ``` Notebook server job 157 started at hpc-worker-02:52649 (JupyterHub restart) server never showed up at http://hpc-master-node:52649 ``` This fixes the problem by preserving `self.server.ip` similar to `self.server.port` in `QHubHPCSpawnerBase.poll()`.

Follow-up to #106 and fixes #104 (again) We discovered in the JupyterHub logs that it was trying to contact the master node for jobs scheduled on worker nodes which was incorrect and led to them getting killed: ``` Notebook server job 157 started at hpc-worker-02:52649 (JupyterHub restart) server never showed up at http://hpc-master-node:52649 ``` This fixes the problem by preserving `self.server.ip` similar to `self.server.port` in `QHubHPCSpawnerBase.poll()`.

@sjdemartini

Closes #104 Thanks @sjdemartini for catching this fix in https://jupyterhub.readthedocs.io/en/stable/changelog.html#bugs-fixed.

@sjdemartini

Closes #104 Thanks @sjdemartini for catching this fix in https://jupyterhub.readthedocs.io/en/stable/changelog.html#bugs-fixed.

costrouc added the bug Something isn't working label Mar 2, 2022

costrouc assigned Adam-D-Lewis Mar 2, 2022

This comment was marked as outdated.

Sign in to view

Adam-D-Lewis mentioned this issue Mar 4, 2022

tell systemd to not kill child jupyterhub processes #106

Merged

costrouc closed this as completed in #106 Mar 14, 2022

costrouc reopened this Apr 11, 2022

ericdwang mentioned this issue Apr 11, 2022

Fix worker node servers getting killed after JuptyerHub restart #124

Merged

costrouc closed this as completed in #124 Apr 11, 2022

costrouc added a commit that referenced this issue Jun 2, 2022

Proper fix to server being spontaneously deleted on restart

e752861

Closes #104 Thanks @sjdemartini for catching this fix in https://jupyterhub.readthedocs.io/en/stable/changelog.html#bugs-fixed.

costrouc mentioned this issue Jun 2, 2022

Proper fix to server being spontaneously deleted on restart #128

Merged

costrouc added a commit that referenced this issue Jun 3, 2022

Proper fix to server being spontaneously deleted on restart (#128)

867edb1

Closes #104 Thanks @sjdemartini for catching this fix in https://jupyterhub.readthedocs.io/en/stable/changelog.html#bugs-fixed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JupyterLab servers are being killed when jupyterhub is updated #104

JupyterLab servers are being killed when jupyterhub is updated #104

costrouc commented Mar 2, 2022

costrouc commented Mar 2, 2022

costrouc commented Mar 2, 2022

This comment was marked as outdated.

Adam-D-Lewis commented Mar 4, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 11, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 11, 2022 •

edited

Loading

costrouc commented Mar 11, 2022

sjdemartini commented Mar 29, 2022

sjdemartini commented Mar 29, 2022

JupyterLab servers are being killed when jupyterhub is updated #104

JupyterLab servers are being killed when jupyterhub is updated #104

Comments

costrouc commented Mar 2, 2022

costrouc commented Mar 2, 2022

costrouc commented Mar 2, 2022

This comment was marked as outdated.

Adam-D-Lewis commented Mar 4, 2022 • edited Loading

Adam-D-Lewis commented Mar 11, 2022 • edited Loading

Adam-D-Lewis commented Mar 11, 2022 • edited Loading

costrouc commented Mar 11, 2022

sjdemartini commented Mar 29, 2022

sjdemartini commented Mar 29, 2022

Adam-D-Lewis commented Mar 4, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 11, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 11, 2022 •

edited

Loading