Representing schema in xarray #169

ColCarroll · 2018-08-12T17:04:33Z

In order to get to feature parity with pymc3 plotting, we need to have a way to access sampler statistics (specifically, to access divergences), and ppc samples. @aseyboldt outlined a good way to think about the rest of the schema here.

At the same time, xarray supports groups, but it doesn't look like it does so natively (yet? see the discussion at pydata/xarray#1092).

I am proposing something like

import xarray as xr
import netCDF4 as nc

class Trace(object):
    def __init__(self, filename):
        self.filename = filename
        self.data = nc.Dataset(filename)
        self.groups = self.data.groups
        
    def __getattr__(self, name):
        if name in self.groups:
            return xr.open_dataset(self.filename, group=name)
        raise AttributeError("informative message")
    
    def __dir__(self):
        """Allows for tab completion on netCDF group names"""
        return super(Trace, self).__dir__() + list(self.groups.keys())

This is a pretty light wrapper around netCDF and xarray. Usage is something like

t = Trace('mytrace.nc')
t.posterior  # this is an xarray.Dataset
t.posterior.mu.mean()  # calculate the mean of a variable

I think this will have to change a little bit so that nested groups work fine. In particular, something like

t = Trace('mytrace.nc')
t.sampler_stats.divergences  # should return an xarray.Dataset
t.sampler_stats  #  I think this would tend to return an empty xarray.Dataset

ColCarroll · 2018-08-12T17:05:41Z

Also pinging @shoyer in case he has advice for doing this.

shoyer · 2018-08-13T05:10:18Z

This seems more or less reasonable to me, but note that opening a netCDF file isn't always cheap (this is also true to a lesser extent with creating an xarray.Dataset). I expect you will be happier with caching or eagerly creating Dataset objects rather than recreating them in __getattr__.

canyon289 · 2018-08-13T13:58:39Z

@ColCarroll Should adding a representation of a prior be considered as well? Thinking of this in regards to this issue

pymc-devs/pymc#3104

canyon289 · 2018-08-13T14:00:10Z

Nevermind, I see that its already mentioned in the schema document

ColCarroll · 2018-08-13T14:04:13Z

My thought is that the wrapper above would be used to let arviz decide what plots to "allow", or to throw exceptions with. Once we start adding other data (priors, sampler statistics, observed data, posterior_predictive), I think there are exponentially more interesting visualizations to be done.

canyon289 · 2018-08-13T14:13:33Z

One big benefit is that its a one stop shop for all the data in a model that is interesting as well, so it makes it easier for people new to the field to figure out what to learn

ColCarroll · 2018-08-28T13:35:56Z

Closed by #173 and #176

ColCarroll mentioned this issue Aug 23, 2018

Add an InferenceData object #173

Merged

ColCarroll closed this as completed Aug 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representing schema in xarray #169

Representing schema in xarray #169

ColCarroll commented Aug 12, 2018 •

edited

Loading

ColCarroll commented Aug 12, 2018

shoyer commented Aug 13, 2018

canyon289 commented Aug 13, 2018

canyon289 commented Aug 13, 2018

ColCarroll commented Aug 13, 2018

canyon289 commented Aug 13, 2018

ColCarroll commented Aug 28, 2018

Representing schema in xarray #169

Representing schema in xarray #169

Comments

ColCarroll commented Aug 12, 2018 • edited Loading

ColCarroll commented Aug 12, 2018

shoyer commented Aug 13, 2018

canyon289 commented Aug 13, 2018

canyon289 commented Aug 13, 2018

ColCarroll commented Aug 13, 2018

canyon289 commented Aug 13, 2018

ColCarroll commented Aug 28, 2018

ColCarroll commented Aug 12, 2018 •

edited

Loading