Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add glossary to documentation #3352

Merged
merged 5 commits into from
Sep 29, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Documentation

**User Guide**

* :doc:`terminology`
* :doc:`data-structures`
* :doc:`indexing`
* :doc:`interpolation`
Expand All @@ -65,6 +66,7 @@ Documentation
:hidden:
:caption: User Guide

terminology
data-structures
indexing
interpolation
Expand Down
51 changes: 51 additions & 0 deletions doc/terminology.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
.. _terminology:

.. https://github.com/pydata/xarray/issues/2410
.. https://github.com/pydata/xarray/issues/1295
gwgundersen marked this conversation as resolved.
Show resolved Hide resolved

Terminology
dcherian marked this conversation as resolved.
Show resolved Hide resolved
===========

*Xarray terminology differs slightly from CF and mathematical conventions, and therefore using xarray, understanding the documentation, and parsing error messages is easier once key terminology is defined. This glossary was designed so that more fundamental concepts come first. Thus for new users, this page is best read top-to-bottom. Throughout the glossary,* ``arr`` *will refer to an xarray* :py:class:`DataArray` *in any small examples. For more complete examples, please consult the relevant documentation.*
gwgundersen marked this conversation as resolved.
Show resolved Hide resolved

----
dcherian marked this conversation as resolved.
Show resolved Hide resolved

**DataArray:** A multi-dimensional array with labeled or named dimensions. If its optional ``name`` property is set, it is a *named DataArray*.
dcherian marked this conversation as resolved.
Show resolved Hide resolved

----

**Dimension / dimensions:** A *dimension* is a nonnegative number for the dimensionality of the underlying data, while an array's *dimensions* are a set of dimension names. The name of the ``i``-th dimension is ``arr.dims[i]``. If an array is created without dimensions, the default dimension names are ``dim_0``, ``dim_1``, and so forth.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about A *dimension* is a nonnegative number for the dimensionality —I think that's a good definition for dimensionality, but wouldn't think of a dimension as a number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled with this because in math, a "dimension" is a number, but in common parlance and xarray, "dimension" is often used to refer to a dimension axis—or a locus in which all but one of the coordinates ("coordinates" in the math sense, not xarray sense) is fixed. I didn't want to define "coordinate" and "dimension" in the math sense and then re-define them, although I think that would be the clearest at the expense of brevity.

Anyway, how about this:

Dimension: In mathematics, the dimension of data is loosely the number of degrees of freedom for it. A dimension axis is a set of all points in which all but one of these degrees of freedom is fixed. We can think of each dimension axis as having a name, for example the "x dimension". In xarray, a DataArray's dimensions are its named dimension axes, and the name of the i-th dimension is arr.dims[i]. If an array is created without dimensions, the default dimension names are dim_0, dim_1, and so forth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth borrowing the description from the netCDF data model documentation: https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_data_model.html

Dimensions: describe the axes of the [DataArrays]. A dimension has a name and a length...


----

**Coordinate:** A one-dimensional array that labels a dimension of another ``DataArray``. There are two types of coordinate arrays: *dimension coordinates* and *non-dimension coordinates* (see below). A coordinate named ``x`` can be retrieved from ``arr.coords[x]``. A ``DataArray`` can have more coordinates than dimensions because a single dimension can be assigned multiple coordinate arrays. However, only one coordinate array can be a assigned as a particular dimension's dimension coordinate array. As a consequence, ``len(arr.dims) <= len(arr.coords)`` in general.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-dimensional coordinates can be multi-dimensional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought "multi-dimensional coordinates" really just meant multiple one-dimensional coordinate arrays labeling the same dimension. I inferred that from this page. Is this what you mean or something else? Also, could you please show me an example? I've tried and failed to use assign_coords to add a multi-dimensional DataArray as a coordinate array.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example

In [8]: xr.DataArray(np.ones((2,3)), dims=['a','b'], coords=dict(c=(('a','b'), np.zeros((2,3)))))
Out[8]:
<xarray.DataArray (a: 2, b: 3)>
array([[1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
    c        (a, b) float64 0.0 0.0 0.0 0.0 0.0 0.0
Dimensions without coordinates: a, b

dcherian marked this conversation as resolved.
Show resolved Hide resolved

----

**Dimension coordinate:** A coordinate array assigned to ``arr`` with both a name and dimension name in ``arr.dims`` (see **Name matching rules** below). Dimension coordinates are used for label-based indexing and alignment, like the index found on a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact, dimension coordinates use :py:class:`pandas.Index` objects under the hood for efficient computation. Dimension coordinates are marked by ``*`` when printing a ``DataArray`` or ``Dataset``.

----

**Non-dimension coordinate:** A coordinate array assigned to `arr`` with a name in ``arr.dims`` but a dimension name *not* in ``arr.dims`` (see **Name matching rules** below). These coordinate arrays are useful for auxiliary labeling. However, non-dimension coordinates are not indexed, and any operation on non-dimension coordinates that leverages indexing will fail. Printing ``arr.coords`` will print all of ``arr``'s coordinate names, with the assigned dimensions in parentheses. For example, ``coord_name (dim_name) 1 2 3 ...``.

.. note::

**Name matching rules:** Xarray follows simple but important-to-grok name matching rules for dimensions and coordinates. Let ``arr`` be an array with an existing dimension ``x`` and assigned new coordinates ``new_coords``. If ``new_coords`` is a list-like collection, then they must be assigned a name that matches an existing dimension. For example, if ``arr.assign_coords({'x': new_coords}).``
dcherian marked this conversation as resolved.
Show resolved Hide resolved

However, if ``new_coords`` is a one-dimensional ``DataArray``, then the rules are slightly more complex. In this case, if both ``new_coords``'s name and only dimension match any dimension name in ``arr.dims``, it is assigned as a dimension coordinate to ``arr``. If ``new_coords``'s name matches a name in ``arr.dims`` but its own dimension name does not, it is assigned as a non-dimension coordinate with name ``new_coords.dims[0]`` to ``arr``. Otherwise, an exception is raised.
gwgundersen marked this conversation as resolved.
Show resolved Hide resolved

----

**Index:** An *index* is a :py:class:`pandas.Index` that indexes the values in a dimension coordinate. Non-dimension coordinates are not indexed. The index associated with dimension name ``x`` can be retrieved by ``arr.indexes[x]``. By construction, ``len(arr.dims) == len(arr.indexes)``
gwgundersen marked this conversation as resolved.
Show resolved Hide resolved

----

**Dataset:** A dict-like collection of ``DataArray`` objects with aligned dimensions. Thus, most operations that can be performed on the dimensions of a single ``DataArray`` can be performed on a dataset.
dcherian marked this conversation as resolved.
Show resolved Hide resolved

----

**Variable:** A `NetCDF-like variable <https://www.unidata.ucar.edu/software/netcdf/netcdf/Variables.html>`_ consisting of dimensions, data, and attributes which describe a single array. The main functional difference between variables and numpy arrays is that numerical operations on variables implement array broadcasting by dimension name. Each ``DataArray`` has an underlying variable that can be accessed via ``arr.variable``. However, a variable is not fully described outside of either a ``Dataset`` or a ``DataArray``.

.. note::

The :py:class:`Variable` class is low-level interface and can typically be ignored. However, the word "variable" appears often enough in the code and documentation that is useful to understand.
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ Bug fixes

Documentation
~~~~~~~~~~~~~

- Created a glossary of important xarray terms (:issue:`2410`, :pull:`3352`).
By `Gregory Gundersen <https://github.com/gwgundersen/>`_.

- Add examples for :py:meth:`Dataset.swap_dims` and :py:meth:`DataArray.swap_dims`.
By `Justus Magin <https://github.com/keewis>`_.

Expand Down