Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable selection of variables in daops interface #113

Open
agstephens opened this issue Dec 12, 2023 · 0 comments
Open

Enable selection of variables in daops interface #113

agstephens opened this issue Dec 12, 2023 · 0 comments

Comments

@agstephens
Copy link
Collaborator

agstephens commented Dec 12, 2023

Should we allow the daops interface to include the selection of variables?

Philosophically, we created daops and rook to deal with dataset identifiers, which tend to include only a single data variable (along with its metadata and coordinate variables). As we consider the wider use of roocs we find, as with the ESA CCI datasets at CEDA, that some datasets have many variables. For example, this kerchunk file links to NetCDF files that contain 204 variables!

https://data.ceda.ac.uk/neodc/esacci/cloud/metadata/kerchunk/version3/L3C/ATSR2-AATSR/v3.0/ESACCI-L3C_CLOUD-CLD_PRODUCTS-ATSR2_AATSR-199506-201204-fv3.0-kr1.1.json

Here is an example request to remind us of the existing interface (using the command-line daops subset approach):

daops subset --area 30,-10,65,30 --time 2000-01-01/2000-02-30 --levels "/" --time-components ""  --output-dir /tmp --file-namer simple https://data.ceda.ac.uk/neodc/esacci/cloud/metadata/kerchunk/version3/L3C/ATSR2-AATSR/v3.0/ESACCI-L3C_CLOUD-CLD_PRODUCTS-ATSR2_AATSR-199506-201204-fv3.0-kr1.1.json

So, should we extend the daops interface to allow specific selection of variables?

If yes, what are the options?

If we decide to support this extension, then maybe we have two options:

  1. Expand the dataset identifier so that it includes variable IDS, such as:
  • use a hash to separate the identifier (or path/URL) and a comma-separated list of variables:
https://data.ceda.ac.uk/neodc/esacci/cloud/metadata/kerchunk/version3/L3C/ATSR2-AATSR/v3.0/ESACCI-L3C_CLOUD-CLD_PRODUCTS-ATSR2_AATSR-199506-201204-fv3.0-kr1.1.json#toa_swup,toa_swup_clr,toa_swup_hig

So a full command might be:

daops subset 
  --area 30,-10,65,30 
  --time 2000-01-01/2000-02-30 
  --levels "/" 
  --time-components ""  
  --output-dir /tmp 
  --file-namer simple 
  https://data.ceda.ac.uk/neodc/esacci/cloud/metadata/kerchunk/version3/L3C/ATSR2-AATSR/v3.0/ESACCI-L3C_CLOUD-CLD_PRODUCTS-ATSR2_AATSR-199506-201204-fv3.0-kr1.1.json#toa_swup,toa_swup_clr,toa_swup_hig
  1. Add in a new parameter, such as variables:
  • variables: list of strings (or variable IDs) - DEFAULT = None (i.e. include all variables)
  • time
  • area
  • level
  • collection

@cehbrecht: what are your thoughts on this proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant