Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-think GESIS server #3087

Open
rgaiacs opened this issue Sep 4, 2024 · 2 comments
Open

Re-think GESIS server #3087

rgaiacs opened this issue Sep 4, 2024 · 2 comments
Assignees

Comments

@rgaiacs
Copy link
Collaborator

rgaiacs commented Sep 4, 2024

Related to #3056

Related to #3080

Related to #3083

GESIS has been running their mybinder.org server with the following configuration for a few years. @arnim will be able to say since when.

gesis-notebooks-current drawio

The build-* pods push the container images to Docker Hub and the jupyter-* pods pull the images from Docker Hub. The server has a large (6TB) storage volume allocated to containerd that kubelet operates.

Recently, we have noticed more of the following behaviour.

Screenshot 2024-09-04 at 09-02-31 Pod Activity - Starred - Grafana

The number of pending pods start to increase. The pending pods are jupyter-* waiting for the image. Some of the jupyter-* change their status from pending to terminating when JupyterHub or BinderHub decides to replace that pod due to timeout and this increases the number of pending pods.

@arnim and I believe that the the observed behaviour of pending pods start to increase is trigger in the following scenarios:

  1. we received a simultaneous "large" number of launch requests that requires build a new image.

    This happens when

    1. someone is "debugging" a repo2docker created image on mybinder.org.
    2. someone is running a "how to configure your Git repository to work with mybinder.org" workshop. This is the scenario that @arnim and I reproduced during our investigation.
    3. we are out of lucky.
  2. we received a simultaneous "huge" number of launch requests that requires to pull a existing image.

    This happens when

    1. someone is running a class / tutorial / workshop using mybinder.org, for example https://github.com/calculuslab/Calculus_Lab and https://github.com/ManimCommunity/jupyter_examples, and all the learners simultaneous do a launch request. This is the scenario that @arnim and I reproduced during our investigation.
    2. we are out of lucky.

@arnim and I first idea was to expand the storage volume allocated to containerd that kubelet operates. Because we operate a physical server, GESIS IT informed us that, due physical limitations, was not possible to increase the storage volume.

@arnim and I second idea was to use a local registry. @arnim would like something like the following diagram so that people could download the image from Docker Hub.

gesis-notebooks-prospect drawio

I have been looking at how to run a local registry that allow us

  1. build-* pods to push to the local registry
  2. the local registry to replicate the new images into Docker Hub
  3. jupyter-* pods to pull from the local registry and, if the image is not found, to pull from Docker Hub.

I didn't found anything out of the box that can do this. Sonatype Nexus Repository has a feature called "Grouping Docker Repositories" that let me define a endpoint that is a virtual endpoint to two or more container registry. But Sonatype Nexus Repository does not offer the option to push new images to Docker Hub.

Any 2 cents are very welcome!

@arnim
Copy link
Contributor

arnim commented Sep 17, 2024

@rgaiacs How much storage has IT offered us for the local image cash / pass through registry?

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Sep 17, 2024

How much storage has IT offered us for the local image cash / pass through registry?

10TB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants