Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paola issues #93

Merged
merged 8 commits into from
Jun 11, 2024
Merged

Paola issues #93

merged 8 commits into from
Jun 11, 2024

Conversation

paolap
Copy link
Contributor

@paolap paolap commented Jun 5, 2024

This is addressing a few issues:

#71 making sure we have correspondence between indexes of concepts and tech pages
#92 there was an empty dropdown in this page, I fixed that by adding the tar cheat sheet info
#90 removed a pin to an older version of Jupyter-book
#83 I covered the example mentioned
#73 added an introduction to this new section in the introduction page (please review)
And #53 , this has become a bit bigger, I reviewed the entire publishing section (opened new issues) and:

  1. fixed an issue with tabs that weren't showing correctly in publishing options
  2. created a new page publishing procedure to cover the generalised case before going into the options and the details of specific options. This isn't quite complete (see missing abstract for example) but I would like some feedback before proceeding.
    It also might help to review/merge to at least fix the other issues.

I'm trying to encourage our users to rely on it now they need to manage their data on their own

Copy link
Contributor

@hot007 hot007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, I think this is the right button to press but apologies if not - I approve this PR pending a few minor corrections.

Governance/_toc.yml Show resolved Hide resolved
Governance/concepts/concept-intro.md Show resolved Hide resolved
Governance/concepts/other-conventions.md Outdated Show resolved Hide resolved
@@ -3,8 +3,3 @@
Data, software and links to external data holdings (such as NCI) can be added to the [CSIRO Data Access Portal](https://data.csiro.au/) (DAP) by **CSIRO staff** only. More information on creating metadata records and uploading files to the CSIRO DAP can be found on the [DAP Help Guide](https://confluence.csiro.au/display/dap/Deposit+and+Manage+Data) (staff only access).

CSIRO-affiliated data can be published in the DAP and the lead creator does not have to be CSIRO staff member.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't do any changes until the current PR is merged but happy to take this action (or Katie can) after merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I temporarily updated the links as for the other part just to be consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a newly created issue #88

Governance/publish/publish-options.md Outdated Show resolved Hide resolved
Governance/publish/publish-procedure.md Outdated Show resolved Hide resolved
Governance/publish/publish-procedure.md Outdated Show resolved Hide resolved
Governance/publish/publish-procedure.md Outdated Show resolved Hide resolved
Governance/publish/publish-procedure.md Outdated Show resolved Hide resolved
Governance/publish/publish-procedure.md Show resolved Hide resolved

2. If the output is big publish only a subset. If methods are well described, the software used is easily available, then publishing only the subset of data that underlines a publication is sufficient. For example, the post-processed output is sufficient for a model simulation. However, the model version and configuration used, the input data and model source code should be documented.

3. It's essential to consider an end user point of view. What would a user look for when considering using a dataset? What kind of information is essential for the data to be usable? Which additional information would make its use easier?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Consider adding an example of what's essential to consider from an end-user point of view. For example, ACS aims to publish bias-corrected data on a 5km grid; this decision is mainly based on user needs who want to have higher and higher resolved data. But I am not sure if that's an example for data creation rather than publish procedure.

Governance/publish/publish-procedure.md Outdated Show resolved Hide resolved
The way files are organised in folders, their names and sizes should consider both how the files will be distributed and how they will be used. For example, the protocol used to download the files might have a maximum size allowed, likewise having to list a lot of small files it's inefficient when loading an html page. Names also should be descriptive enough that a file can be recognised easily after has been downloaded as being part of a specific dataset. We covered [files organisation and naming](../tech/drs-names) in the technical pages of this book, however, it is important to check the publisher instructions in this regard, or, if none are available online, contacting them about it as early as possible in the publishing process.

**Conventions**
It is important to use [conventions](../concepts/conventions) and [controlled vocabularies](../concepts/controlled-vocab) whenever possible, both official ones, like CF conventions for file attributes, and others which are not a requirement but have become common practice in the climate community (e.g. CMIP variable names). As some of these conventions also apply to folder and file names, it is important to be consistent and use the same terms in the files, names and descriptions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could mention the CORDEX convention as an example, too. One thing I learned recently is that the CORDEX convention keeps changing (aka updating), and if one is not aware of the changing convention happening every couple of weeks, one might miss updating the data file and not be CORDEX aligned anymore. Currently, there are data description and meta-data differences between ACS-CCAM and QLD-CCAM published CMIP6 CORDEX data because of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to give some generic advice, hence I mentioned only CF, we should have a publishing with ESGF where all these quirks can be added. CMOR, CORDEX etc. I will copy this comment to the relevant issue. And also could be added to other-conventions file? We should be covering cortex there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see issue #28

@paolap
Copy link
Contributor Author

paolap commented Jun 6, 2024

Uhh this is confusing I think I answered some of Alicia comments in the files list, can't see my answers here, anyway hopefully they will appear, and I think I covered all the reviews, but if I missed something let me know, I added the "abstract" now renamed "dataset web page" that was fully missing and also re-formatted the readme example in tech. There might be more pages like that that need reviewing. I will check tomorrow while we see if Chloe wants to also review or is happy with the changes.
One thing I noticed is that we maybe should remove the page : "Preparing files for publication" from create section. This hasn't been really populated yet and the publishing procedure should at least cover some of that. Some of the topics listed are specific to ESGF and could be covered in the publish with ESGF page that also we still need to fill.

@hot007
Copy link
Contributor

hot007 commented Jun 6, 2024

Thanks good to go from me :)

@chloemackallah chloemackallah merged commit 5f0e546 into main Jun 11, 2024
@chloemackallah chloemackallah deleted the paola_issues branch June 11, 2024 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants