Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] improvement suggestions for the label guide documentation (until sorting dimensions chapter) #1699

Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 53 additions & 44 deletions doc/source/user_guide/label_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,42 +4,47 @@
Label guide
===========

Basic labeling
--------------
Basic labelling
---------------

All ArviZ plotting functions and some stats functions take an optional ``labeller`` argument.
By default, labels show the variable name and the coordinate value
(for multidimensional variables only).
The first example below uses this default labeling.
All ArviZ plotting functions and some stats functions can take an optional ``labeller`` argument.
By default, labels show the variable name.
Multidimensional variables also show the coordinate value.

.. ipython::

Example: Default labelling
~~~~~~~~~~~~~~~~~~~~~~~~~~

In [1]: import arviz as az
...: schools = az.load_arviz_data("centered_eight")
...: az.summary(schools)

Thanks to being powered by xarray, ArviZ supports label based indexing.
We can therefore use the labels we have seen in the summary to plot only a subset of the variables,
the one we are interested in.
Provided we know that the coordinate values shown for theta correspond to the `school` dimension,
we can plot only ``tau`` to better inspect it's 1.03 :func:`~arviz.rhat` and
``theta`` for ``Choate`` and ``St. Paul's``, the ones with higher means:
ArviZ supports label based indexing powered by xarray.
Through label based indexing you can use labels to plot a subset of selected variables.

Example: Label based indexing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For a case where the coordinate values shown for the ``theta`` variable coordinate to the ``school`` dimension
you can indicate ArviZ to plot ``tau`` by including it in the ``var_names`` argument to inspect its 1.03 :func:`~arviz.rhat` value.
To inspect the ``theta`` values for the ``Choate`` and ``St. Paul's`` coordinates, you can include ``theta`` in ``var_names`` and use the ``coords`` argument to select only these two coordinate values.
You can generate this plot with the following command:

.. ipython:: python

@savefig label_guide_plot_trace.png
az.plot_trace(schools, var_names=["tau", "theta"], coords={"school": ["Choate", "St. Paul's"]}, compact=False);

So far so good, we can identify some issues for low ``tau`` values which is great start.
But say we want to make a report on Deerfield, Hotchkiss and Lawrenceville schools to
see the probability of ``theta > 5`` and we have to present it somewhere with math notation.
Our default labels show ``theta``, not $\theta$ (generated from ``$\theta$`` using $\LaTeX$).
With this you can now identify issues for low ``tau`` values.

Example: Using the labeller argument
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fear not, we can use the labeller argument to customize the labels.
The ``arviz.labels`` module contains some classes that cover some common customization classes.
To create a report on Deerfield, Hotchkiss and Lawrenceville schools for the probability of ``theta > 5`` and use the labeller argument to customize labels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence now feels incomplete. It starts with "to create" and a description on what to create but nothing on how to create it.

A bit more context in case it helps. Here there are two goals: creating a report of theta > 5 and using the labeller to customize the labels.

One can be done without the other, but instead of showing how to use the labeller (which I think should be done in the example in he docstring that is still wip), I wanted to show how to use the labeller to solve a more specific and real task. When exploring the mode ourselves, we won't generally care much about the labels, having the same thing as the code is fine. However, if generating a report to be published or presented, we'll probably want to take better care of the presentation, and match the labels to the labels in the equations of the paper instead of variables in the code. Hence the MapLabeller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for providing a bit of context to this matter. The way I see it, since this piece of text is inside of the example, we don't even need the first part of the sentence.

Unlike the default labels that show ``theta``, not $\theta$ (generated from ``$\theta$`` using $\LaTeX$), the labeller argument presents the label with proper math notation.

In this case, we can use :class:`~arviz.labels.MapLabeller` and
tell it to rename the variable name ``theta`` to ``$\theta$``, like so:
You can use :class:`~arviz.labels.MapLabeller` to rename the variable ``theta`` to ``$\theta$``, as shown in the following example:

.. ipython::

Expand All @@ -50,24 +55,26 @@ tell it to rename the variable name ``theta`` to ``$\theta$``, like so:
@savefig label_guide_plot_posterior.png
In [1]: az.plot_posterior(schools, var_names="theta", coords=coords, labeller=labeller, ref_val=5);

You can see the labellers available in ArviZ at :ref:`their API reference page <labeller_api>`.
Their names aim to be descriptive and they all have examples in their docstring.
For further customization continue reading this guide.
.. seealso::

- For a list of labellers available in ArviZ, see the :ref:`the API reference page <labeller_api>`.

Sorting labels
--------------

Labels in ArviZ can generally be sorted in two ways,
using the arguments passed to ArviZ plotting functions or
sorting the underlying xarray Dataset.
The first one is more convenient for single time ordering
whereas the second is better if you want plots consistently sorted that way and
is also more flexible, using ArviZ args is more limited.
ArviZ allows labels to be sorted in two ways:

- Using the arguments passed to ArviZ plotting functions
- Sorting the underlying :class:`xarray.Dataset`

The first option is more suitable for single time ordering whereas the second option is more suitable for sorting plots consistently.

.. note::

Both ways are limited.
Multidimensional variables can not be separated.
For example, it is possible to sort ``theta, mu,`` or ``tau`` in any order, and within ``theta`` to sort the schools in any order, but it is not possible to sort half of the schools, then ``mu`` and ``tau`` and then the rest of the schools.

Both alternatives have an important limitation though.
Multidimensional variables are always together.
We can sort ``theta, mu, tau`` in any order, and within ``theta`` we can sort the schools in any order,
but it's not possible to show half the schools, then ``mu`` and ``tau`` and then the rest of the schools.

Sorting variable names
......................
Expand All @@ -78,16 +85,15 @@ Sorting variable names

.. tabbed:: ArviZ args

We can pass a list with the variable names sorted to modify the order in which they appear
when calling ArviZ functions
For variable names to appear sorted when calling ArviZ functions, pass a list of the variable names with the variable names sorted.

.. ipython::

In [1]: az.summary(schools, var_names=var_order)

.. tabbed:: xarray

In xarray, subsetting the Datset with a sorted list of variable names will order the Dataset.
In xarray, subsetting the Dataset with a sorted list of variable names will order the Dataset.

.. ipython::

Expand All @@ -97,22 +103,25 @@ Sorting variable names
Sorting coordinate values
.........................

We may also want to sort the schools by their mean.
To do so we first have to get the means of each school:
To sort coordinate values you have to define the order, store it, and use the result to sort the coordinate values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To sort coordinate values you have to define the order, store it, and use the result to sort the coordinate values.
To sort coordinate values you have to define the order, store it, and use the result to sort the coordinate values.
The order can be defined by performing some operations on our xarray objects (like it is shown in the example below)
or by manually creating a list with the desired order.


Example: Sorting the schools by mean
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1) Locate the means of each school by using the following command:

.. ipython::

In [1]: school_means = schools.posterior["theta"].mean(("chain", "draw"))
...: school_means

We can then use this DataArray result to sort the coordinate values for ``theta``.
Again we have two alternatives:
2) You can use the DataArray result to sort the coordinate values for ``theta``.
There are two ways of sorting:

.. tabbed:: ArviZ args

Here the first step is to sort the coordinate values so we can pass them as `coords` argument and
choose the order of the rows.
If we want to manually sort the schools, `sorted_schools` can be defined straight away as a list
Sort the coordinate values to pass them as a `coords` argument and choose the order of the rows.
To manually sort the schools, `sorted_schools`, define sorted_schools as a list.
Eva-Lotte marked this conversation as resolved.
Show resolved Hide resolved

.. ipython::

Expand All @@ -121,7 +130,7 @@ Again we have two alternatives:

.. tabbed:: xarray

We can use the :meth:`~xarray.Dataset.sortby` method to order our coordinate values straight at the source
You can use the :meth:`~xarray.Dataset.sortby` method to order our coordinate values directly at the source.

.. ipython::

Expand Down