Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we avoid creating repetitive large binary blobs in the repository? #1767

Open
ramcdougal opened this issue Apr 6, 2022 · 0 comments
Labels

Comments

@ramcdougal
Copy link
Member

ramcdougal commented Apr 6, 2022

This is an attempt to summarize and provide a place to continue the discussion started in #1761 .

There are at least two main potential sources of binary data:

  • images (e.g. screenshots for documentation)
  • zip files of tutorial source code

GitHub recommends:

repositories remain small, ideally less than 1 GB, and less than 5 GB is strongly recommended

with individual commits (and thus individual files) required to be under 100 MB. (Since we use third party tools like readthedocs and the various CI tools, we have to respect any of their size limitations as well.)

We're nowhere near these sizes now (a fresh clone is about 160 MB) but having a plan and a policy will help us avoid it becoming a problem in the future.

A few options have been suggested:

  • doing nothing and including all binary files in the repository
  • Git large file storage (basically you include pointers to files elsewhere on the web; they wouldn't have to be large)
  • Directly putting files elsewhere on the web
  • having a separate repo for binary blobs (especially tutorial zips)
  • having a repo for each tutorial and using GitHub to provide zips as needed

where the last two could be done with or without submodules.

Some considerations:

  • Moving binary data out of the repository makes it more fragmented and fragile (e.g. because more things have to work).
  • Not every binary blob needs to be treated the same.
    • Some things are more likely to change than others. Some screenshots in the NEURON documentation today go back over a decade.
    • For tutorials, users may want to download zips of several files, but each file is usually a plain text file.
  • Screenshots and tutorial zips are good and helpful to our users.
  • Whatever we do needs to allow screenshots and tutorials to match the version of the documentation and remain available as long as we continue to make old versions of the documentation available.
  • Data that is outside of the neuronsimulator organization may be managed differently especially in the long-term future. (e.g. we currently link to a repo in simtooldb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant