Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminology for the various coordinates #1295

Closed
fmaussion opened this issue Mar 4, 2017 · 8 comments
Closed

Terminology for the various coordinates #1295

fmaussion opened this issue Mar 4, 2017 · 8 comments

Comments

@fmaussion
Copy link
Member

Picking up a thread about the repr (#1199 (comment)), I think it would be good to give a name to the two different types of coordinates in xarray.

Currently the doc says:

One dimensional coordinates with a name equal to their sole dimension (marked by * when printing a dataset or data array) take on a special meaning in xarray. They are used for label based indexing and alignment, like the index found on a pandas DataFrame or Series. Indeed, these “dimension” coordinates use a pandas.Index internally to store their values.

Other than for indexing, xarray does not make any direct use of the values associated with coordinates. Coordinates with names not matching a dimension are not used for alignment or indexing, nor are they required to match when doing arithmetic (see Coordinates).

The use of quotation marks in “dimension” coordinates makes the term imprecise. Should we simply call the former dimension coordinates and the latter optional coordinates?

This would also help to uniformize error reporting (e.g. #1291 (comment))

@shoyer
Copy link
Member

shoyer commented Mar 5, 2017

Should we simply call the former dimension coordinates and the latter optional coordinates?

Yes, let's call them "dimension coordinates".

The later could be called "non-dimension coordinates", but even dimension coordinates are optional so we shouldn't call these "optional".

@rabernat
Copy link
Contributor

rabernat commented Mar 5, 2017

Wherever possible we should try to adhere to CF convention terminology.

Some relevant definitions are:

coordinate variable
We use this term precisely as it is defined in section 2.3.1 of the NUG . It is a one-dimensional variable with the same name as its dimension [e.g., time(time) ], and it is defined as a numeric data type with values that are ordered monotonically. Missing values are not allowed in coordinate variables.

auxiliary coordinate variable
Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the NUG and used by this standard - see below). Unlike coordinate variables, there is no relationship between the name of an auxiliary coordinate variable and the name(s) of its dimension(s).

multidimensional coordinate variable
An auxiliary coordinate variable that is multidimensional.

Using these definitions, it seems that @shoyer's "dimension coordinate" == CF's "coordinate variable" and @shoyer's "non-dimension coordinate" == CF's "auxiliary coordinate variable"

@fmaussion
Copy link
Member Author

Thanks @rabernat , I think this makes sense. I like "dimension coordinate" better and less ambiguous than "coordinate variable", but staying in line with CF clearly is the best thing to do here.

@shoyer
Copy link
Member

shoyer commented Mar 5, 2017

I think it's confusing to use "coordinate" to refer to only variables matching dimension names and that "auxiliary coordinates" are not a type of coordinate. It just doesn't make any sense in terms of the usual rules for categorizing things. This is especially problematic for software like xarray which people use without looking carefully at the docs, and for which many users aren't familiar with CF conventions.

So I feel pretty strongly that CF/NUG conventions get this one wrong, and for xarray we should say that anything in .coords is a coordinate variable, which we can further qualify in various ways.

@rabernat
Copy link
Contributor

rabernat commented Mar 6, 2017

I don't feel very strongly about this...just pointing out that CF conventions do define terminology relevant to this discussion.

I'm fine with departing from CF convention terminology where we think it is unnecessarily confusing. But we should try to explain how and why we depart in the docs. @shoyer's comment above would in fact be a useful addition to the docs.

@fmaussion
Copy link
Member Author

I'm fine with this too. In particular, I find "dimension coordinate" much more meaningful than just "coordinate variable".

Do I read this correctly that we agree on:

  • dimension coordinates
  • non-dimension coordinates
    ?

I'm also fine with "auxiliary coordinates" for the second type. Let me know which one we should pick, I'll update the PR accordingly.

@rabernat
Copy link
Contributor

rabernat commented Mar 6, 2017

I personally like "auxiliary coordinate" for the second type. This makes it clear that the variable gives additional information that is not as fundamental as the "dimension coordinate".

@shoyer
Copy link
Member

shoyer commented Mar 6, 2017

I agree that "auxiliary coordinate" is a better name, but I think "non-dimension coordinate" is clearer for the rare cases where we want to refer to these coordinates, given that we don't have any name for these coordinates in the xarray data model itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants