Skip to content
This repository has been archived by the owner on Oct 30, 2024. It is now read-only.

Time-out when creating large zip files #17

Open
ryanlovett opened this issue Oct 2, 2018 · 5 comments
Open

Time-out when creating large zip files #17

ryanlovett opened this issue Oct 2, 2018 · 5 comments

Comments

@ryanlovett
Copy link
Contributor

On a hub, if the I/O is sufficiently slow such that the time to complete the zipping process exceeds proxy time-out intervals, the user will get a 504 Gateway Time-out.

One possible solution:

  • create the zip in some tmp space on the server side
  • update the UI with a download link +- some notification
  • delete the zip after the user downloads it
@ryanlovett
Copy link
Contributor Author

Another option would be to stream the file while zipping.

@lwasser
Copy link

lwasser commented May 6, 2020

i am running into some timeout path issues as well. just curious @ryanlovett i know you are not the maintainer of this but are you using this in your hub setup at Berkeley with success? i'm having some issues now!!

@ryanlovett
Copy link
Contributor Author

@lwasser We used to have more trouble with this, but it lessened when user home directories shrunk. I believe this was mostly due to using sparse checkouts in nbgitpuller. Prior to this, ~/.git took up the most space in user home directories.

I initially started to patch nbzip with support for streaming, but ended up rewriting it as jupyter-tree-download. We are using it in a few classes.

jupyter-archive also has this feature with support for Jupyter Lab.

@lwasser
Copy link

lwasser commented May 7, 2020

@ryanlovett funny you mention nbgitpuller.
This is my other pain point right now. And i just realized that the git history is likely part of the culprit.

If the pod is full - and i try to run gitpuller it won't launch.

Can you clarify how you handled avoiding large git histories? i only need the files to load i don't need the full history for the hub (student demo notebooks).

And i'm also curious if you'd had the issue with pods not loading when they are full because of git puller and how to circumvented that via a check somewhere (if you have done this).
Many thanks. i'm learning a lot through this process!!

BTW thank you for jupyter-tree-download it is working PERFECTLY for the download.!!

@ryanlovett
Copy link
Contributor Author

ryanlovett commented May 8, 2020

I could have sworn that we either did sparse checkouts, or were doing shallow clones via the depth setting. However our user environment isn't setting this.

The repo containing the student materials only goes back to the beginning of the semester so it doesn't have a huge git history. Our original materials repo had grown to at least 500MB at one point. This would explain why ~/.git/ isn't as large as it used to be when we're not altering the nbgitpuller configuration. So one strategy would be to start with a new repo each term. Otherwise there are a number of online recipes for reducing the size of the history in an existing repo. I'm not confident in my git-fu to recommend one recipe over another.

Glad to hear that jupyter-tree-download is working for you! Since jupyter-archive is being actively developed and has support for lab, I should probably create a PR for jupyter-archive to have a button in classic notebook, then retire nbzip and jupyter-tree-download.

And i'm also curious if you'd had the issue with pods not loading when they are full because of git puller and how to circumvented that via a check somewhere (if you have done this).

Our user storage is shared via an nfs volume so we haven't ever had to deal with out-of-disk conditions. But it would make sense for the notebook server to fail to start if it couldn't write to ~/.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants