Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paola issues #93

Merged
merged 8 commits into from
Jun 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Governance/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ parts:
- caption: Publishing climate data
chapters:
- file: publish/publish-intro
- file: publish/publish-procedure
paolap marked this conversation as resolved.
Show resolved Hide resolved
- file: publish/publish-options
sections:
- file: publish/publish-nci-geonetwork
Expand Down
3 changes: 2 additions & 1 deletion Governance/concepts/concept-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@ In this section of the book we are covering the key concepts associated with dat

**Index**

* [](authorship.md)
* [](backup.md)
* [Controlled vocabulary](controlled-vocab.md)
* [Conventions and standards](conventions.md)
* [](availability-statement.md)
* [](collaboration-agreement.md)
* [](dmp.md)
* [](policies.md)
* [FAIRER data](fairer-principles.md)
* [FAIRER principles](fairer-principles.md)
paolap marked this conversation as resolved.
Show resolved Hide resolved
* [Journal requirements](journal.md)
* [Open Access Licenses](license.md)
* [Persistent identifiers](pids.md)
Expand Down
4 changes: 4 additions & 0 deletions Governance/concepts/other-conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,7 @@ IMOS] has very specific extensions of the CF conventions for different kind of o
* [AMBER Trajectory Conventions](http://ambermd.org/netcdf/nctraj.xhtml) for molecular dynamics simulations.
* [CF Discrete Sampling Geometries Conventions](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html\#discrete-sampling-geometries) - CF for observational and point data
* [COMODO](https://web.archive.org/web/20160417032300/http://pycomodo.forge.imag.fr/norm.html) ??? still used?

```{note} Potential clashes
Some of these conventions are a spinoff of the CF Conventions and so there's an expectation when applied that the files will also be CF compliant. However, as conventions are ever-evolving documents and the groups working on specific conventions are different, it is possible for them to introduce requirements that clash with the CF conventions. An example of this is the different uses for cf_role in UGRID and CF, UGRID requires values for this attributes which are not included in the values allowed by CF. As CF evolves it's possible that some of these alternatives will become just a use case of CF. In the mentioned example the clash should be resolved with CF v1.11 which should include an integration of UGRID into CF, see the relevant [CF github issue](https://github.com/cf-convention/cf-conventions/issues/501).
```
4 changes: 4 additions & 0 deletions Governance/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,7 @@ As with all scientific outputs, errors and inconsistencies can be found in clima

### **[Retiring climate data](retire/retire-intro.md)**
Data doesn't last forever, usually becoming outdated or obsolete within 5-10 years; this of course is simply the nature of scientific research. In this section, recommendations are presented on how to go about retiring a dataset, both published and replicated, without breaking citations, removing identifiers, or causing disruption to users, while retaining the value of your research data.

### **[Creating climate data products](products/products-intro.md)**
Climate data is often used in other research fields, government initiatives and by private stakeholders for a variety of applications.
The process of adapting and packaging climate data so that it will be of use to a wider audience, with different backgrounds and/or for different purposes is more complex than simply sharing data with other climate researchers. At the moment we provide only an overview of what this section aims to cover. We welcome input and collaboration from people who have relevant experience or would like to propose use cases to cover.
2 changes: 1 addition & 1 deletion Governance/manage/manage-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Always have the code under version control!

## File and directory organisation

While all the generic advice on [how to organise and name files](../tech/drs.md) is still applicable, when replicating a dataset it is important to also consider the original data organisation.
While all the generic advice on [how to organise and name files](../tech/drs-names.md) is still applicable, when replicating a dataset it is important to also consider the original data organisation.

### Naming files

Expand Down
7 changes: 1 addition & 6 deletions Governance/publish/publish-csiro-dap.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
# CSIRO Data Access Portal (DAP)

Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://data.csiro.au/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Deposit+and+Manage+Data) (staff only access).
Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://research.csiro.au/dap/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Publish%2C+Archive%2C+and+Manage+Research+Data+and+Software) (staff only access).

CSIRO-affiliated data can be published in the DAP and the lead creator does not have to be CSIRO staff member.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't do any changes until the current PR is merged but happy to take this action (or Katie can) after merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I temporarily updated the links as for the other part just to be consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a newly created issue #88

!!!COMMENT: at the moment this is identical to what we have in the Publishing option pages for CSIRO. I'm leaving this here for the moment in case we want to add something more, like:
- summary of the procedure with pros and cons?
- Something which is not covered in the official documentation but could be useful?
- a reminder to follow discipline specific best practices even if they aren't necessarily required?
2 changes: 1 addition & 1 deletion Governance/publish/publish-nci-geonetwork.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ These pages are only visible to interested parties, therefore we provide an [exa
Once the DMP is ready NCI will use the content to create a geonetwork record and mint a DOI for the new dataset. The GeoNetwork record will provide the landing page for the DOI and will be visible only once the files are available on THREDDS.

### Preparing the files
The actual files have to be organised in a <dataset-folder>, this will contain the license and a readme file (usually pointing to the geonetwork record) and a sub-folder for each version containing the data files. How the data is organised will depend on the actual dataset, see the [DRS page](../tech/drs.md) for examples.
The actual files have to be organised in a <dataset-folder>, this will contain the license and a readme file (usually pointing to the geonetwork record) and a sub-folder for each version containing the data files. How the data is organised will depend on the actual dataset, see the [DRS page](../tech/drs-names.md) for examples.
The files should follow both [CF](../concepts/cf-conventions.md) and [ACDD](../concepts/acdd-conventions.md) conventions. Once the files are ready, NCI will run a QC check, CF/ACDD compliance check, and that the files are accessible by widely used software like ncview, nco etc.
If the files passed the tests, then they will add the dataset to THREDDS and activate the DOI. If not, they will send a detailed report of the QC results so the files can be fixed where possible.

Expand Down
28 changes: 14 additions & 14 deletions Governance/publish/publish-options.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
# Publishing options
Once the data is created and ready for publication, **where** can it be published?
Before starting preparing the data files for publication, it is important to individuate **where** it can be published.

One of the main factors is what services are available, which is largely determined by the researcher's institution and/or organisation.
Another important consideration is the kind of data, depending on format, size etc.
In some cases, depending on the dataset, using a public repository is suitable and often the easiest option.
On the other end of the scale, some data could be produced on purpose or simply be suitable to contribute to a large-scale project that has its own publishing procedure established.
The most common use cases for our community are covered here.

::::{tab-set}
:::{tab-item} BoM
````{tab-set}
```{tab-item} BoM
Insert here pathways for BoM researchers...



:::
```

:::{tab-item} CSIRO
Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://data.csiro.au/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Deposit+and+Manage+Data) (staff only access).
```{tab-item} CSIRO
Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://research.csiro.au/dap/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Publish%2C+Archive%2C+and+Manage+Research+Data+and+Software) (staff only access).

CSIRO-affiliated data can be published in the DAP and the lead creator does not have to be CSIRO staff member.
:::
```

:::{tab-item} CLEX
```{tab-item} CLEX
We are looking here at CLEX as an example but as basically CLEX is only a collaboration project among universities, a lot of what applies here it is also usually applicable for anyone who works/studies in another university. Researchers working for a university have potentially more freedom in terms of where they can publish data. Unless they are working for a project which is covered by a data agreement and has specific licensing and/or data distribution requirements, the approach is to follow the FAIR principle and they are usually expected to share data openly.
Part of making data FAIR is to make it discoverable, so sharing data in a discipline specific collection is to be preferred when possible. This could be a collection of climate data, or of a related discipline, i.e., paleoclimate data, oceanographic data, etc.

Expand All @@ -37,9 +37,9 @@ An institutional repository might not be able to publish big datasets effectivel
**CLEX Data Collection on Zenodo**
For CLEX researchers, students and associates, the CMS team offers advice and can review a record, as well as add it to the [CLEX Data Collection community](https://zenodo.org/communities/arc-coe-clex-data/?page=1&size=20) to improve discoverability.
For more information on Zenodo see the generic options tab.
:::
```

:::{tab-item} Generalist repositories
```{tab-item} Generalist repositories

Repositories like [Zenodo](https://zenodo.org), [Figshare](https://www.google.com/search?client=safari&rls=en&q=figshare&ie=UTF-8&oe=UTF-8), [Mendeley](https://www.data.mendeley.com) are public, generic data repositories. It is usually easy to create an account, add a data record and mint a DOI for it. These repositories also publish different kind of materials. This can be useful if publishing code together with data, for example code and data to produce a specific figure required to publish a paper.
Another advantage is that these services are widely used and so you are more likely to reach an international audience.
Expand All @@ -48,9 +48,9 @@ However, as they are generalist repositories, there are no standards required or

Finally, as for institutional repositories, the data size is limited to 50-100 GB and files can only be downloaded.
We are covering [Zenodo](publish-zenodo.md) more in detail as it is available to anyone and the most used in our community. Figshare is not free but it might be available via an institutional account. Mendeley is free but lees used for climate data.
:::
```

:::{tab-item} Discipline repositories
```{tab-item} Discipline repositories

In some cases, the data might be fit to be published to a specific data portal or as part of a larger initiative.
Of these we are covering only the ESGF case more closely, as it provides an example of a comprehensive publishing process. For the others refer to their websites for more information or there might be some relevant examples at the end of this guidelines.
Expand All @@ -70,6 +70,6 @@ Keep in mind that some of these options can be an extra distribution option for
- [Copernicus Climate Data Store - CDS](https://cds.climate.copernicus.eu)
- [Copernicus Climate Change Services - C3S](https://www.copernicus.eu/en/copernicus-services/climate-change)
<br>Both Copernicus services work with a tender system, so they do not accept requests to publish datasets unless they fit into products they have an open tender for.
:::
```

::::
````
Loading