Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create retire-unpublished.md #26

Merged
merged 3 commits into from
May 16, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions Governance/retire/retire-unpublished.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Use case: unpublished data

This use case addresses the plethora of data that is associated with published data creation. That is, storage use including, but not limited to:

- Model configuration data
- Failed model run output
- Successful model run output
- Data prepared for collaborative sharing but not publication/DOI
- Intermediate data products

How each of these scenarios is handled will typically be determined on a project basis, with a view to the importance of **reproducibility** and considering relative **compute or storage costs**.

## Suggested procedures

**If compute is readily available but storage is limited**

1. Maintain a database or wiki of model runs
2. Create zip archives of model configurations and move to slow access tape storage if they are required to be kept for reproducibility
3. If model run failed, remove data immediately
4. If model run was successful and post-processing has been completed (and if bit-reproducibility across systems is not a concern), then data can be removed, perhaps after an initial quarantine period for data validation
5. Intermediate data products and collaborative data can be retired at the end of their active projects, following a quaratine period

5a. Some intermediate data may not have a logical project end, such as regridded CMIP data - such data might follow a similar approach as the replciated data use case.

**If compute is limited but deep storage is readily available**

Repeat steps 1-3 as above.

4. Following post-processing and validation, or at project close, data should be tarred and transferred to a deep storage system