You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.
The dataset version has always caused trouble in our cmip6 pipeline. It is the only DRS element which is not stored in the netcdf file's metadata. However, we use the version to keep track of Datasets which have been modified. I have been using the tracking_id to get the version (using the http://hdl.handle.net / https://handle-esgf.dkrz.de data handling service), somewhat successfully to obtain the version information.
But many implementation errors pop up. The gs://cmip6 zarr Datasets have tracking_ids which are concatenations of the netcdf file tracking_ids from which it is aggregated. In a perfect world, all of these tracking_ids would correspond to one and only one netcdf file and each netcdf file would correspond to one and only one version. So I am collecting and categorizing the various issues and trying to come up with some sensible work-arounds. I will be collecting them here, if you want to help ...
The text was updated successfully, but these errors were encountered:
We can get versions of a particular dataset in various ways. The version of the Google Cloud dataset can be obtained either from our Google Cloud catalog, which lists a version for each dataset, or from the data handler using the tracking_ids from the Google Cloud dataset. The available version can be obtained either from the ESGF Search API or from any one of the associated tracking_ids, by tracing through the versions in the data handler.
For example:
dfcat = pd.read_csv('https://cmip6.storage.googleapis.com/cmip6-zarr-consolidated-stores-noQC.csv', dtype='unicode')
gsurl = 'gs://cmip6/CMIP/NCAR/CESM2/historical/r11i1p1f1/Oyr/expc/gr/'
version_cat = dfcat[dfcat.zstore == gsurl].version.values[0]
print('current version from GC catalog = ',version_cat)
tracks = gsurl2tracks(gsurl)
(version,jdict) = tracks2version(tracks)
print('latest version from handler = ', version)
asearch = gsurl2search(gsurl)
dfs = esgf_search(asearch, toFilter = False)
version_ESGF = list(set(dfs.version_id))
print('version(s) available from ESGF = ', version_ESGF)
returns:
current version from GC catalog = 20190514
current version from GC tracks = 20190514
latest version from handler = 20190514
version(s) available from ESGF = ['v20190514']
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The dataset
version
has always caused trouble in our cmip6 pipeline. It is the only DRS element which is not stored in the netcdf file's metadata. However, we use theversion
to keep track of Datasets which have been modified. I have been using thetracking_id
to get theversion
(using thehttp://hdl.handle.net
/https://handle-esgf.dkrz.de
data handling service), somewhat successfully to obtain theversion
information.But many implementation errors pop up. The
gs://cmip6
zarr Datasets havetracking_id
s which are concatenations of the netcdf filetracking_id
s from which it is aggregated. In a perfect world, all of thesetracking_id
s would correspond to one and only one netcdf file and each netcdf file would correspond to one and only oneversion
. So I am collecting and categorizing the various issues and trying to come up with some sensible work-arounds. I will be collecting them here, if you want to help ...The text was updated successfully, but these errors were encountered: