Question: Massage InferenceData for multi-variable model comparison? #998

rpgoldman · 2020-01-14T16:44:04Z

Short Description

I have a PyMC3 model that is partitioned into 6 sub-models. The observations are also partitioned into 6 subsets. This lets me apply different parameters based on the values of independent variables without complex indexing. I can't use mixing because there is no distribution over the independent variables.

I know model comparison only works for models with a single observed RV. But I was wondering: is there some way to "unpartition" the InferenceData so that Arviz can treat the six vectors of observations as one big vector, and compute model comparison metrics?

I can compare the submodels individually, but this does not properly take into account the hyperparameters that link them.

The text was updated successfully, but these errors were encountered:

OriolAbril · 2020-01-14T17:03:54Z

I think it is, but for now it will be convoluted and unnecessarily much more complicated than it needs to be.

The first issue is getting the log_likelihood data from the PyMC3 model, for this you have to use the code in #794 (I can probably rebase and upload in some hours) or write/copy some function to get the log likelihood.

The second issue is ArviZ not accepting multiple log likelihood instances, so the fastest way to get it done is probably to group all the log likelihoods into a single "log_likelihood" variable, something similar to the following pseudocode (assuming code from #794):

idata.sampler_stats["log_likelihood"] = xr.concat(
    [idata.log_likelihoods[var_name] for var_name in var_names]
)
az.waic/loo(idata)

Note: I would start trying if it works with xr.concat, but maybe it is some other function

ahartikainen · 2020-01-14T17:27:08Z

Do you want to sum them together or compare invidually.

If you can extract each cell (use arviz.utils.flat_inference_data_to_dict) and then either create a new InferenceData with from_dict (log_likelihood as a vector) or create a new InferenceData for each variable.

Then compare pair(s).

rpgoldman · 2020-01-14T17:30:50Z

@ahartikainen I would like to compare the models as a whole (rather than comparing the submodels), so I think I want to do what @OriolAbril suggests: effectively concatenate all the observed random variables into one big random variable. This is reasonable, because the observations are interchangeable by the model.

rpgoldman · 2020-01-19T20:41:44Z

I think it is, but for now it will be convoluted and unnecessarily much more complicated than it needs to be.

The first issue is getting the log_likelihood data from the PyMC3 model, for this you have to use the code in #794 (I can probably rebase and upload in some hours) or write/copy some function to get the log likelihood.

The second issue is ArviZ not accepting multiple log likelihood instances, so the fastest way to get it done is probably to group all the log likelihoods into a single "log_likelihood" variable, something similar to the following pseudocode (assuming code from #794):
idata.sampler_stats["log_likelihood"] = xr.concat(
    [idata.log_likelihoods[var_name] for var_name in var_names]
)
az.waic/loo(idata)
Note: I would start trying if it works with xr.concat, but maybe it is some other function

@OriolAbril I tried this, but it didn't work, because the code for PyMC3Converter. _extract_log_likelihood() does not populate this group if there is more than one observed RV:

    def _extract_log_likelihood(self):
        """Compute log likelihood of each observation.

        Return None if there is not exactly 1 observed random variable.
        """
        if len(self.model.observed_RVs) != 1:
            return None, None
...

I think it might be possible to build multiple InferenceData objects, one for each observed RV, by modifying the trace before saving it. I'll report back if this works.

OriolAbril · 2020-01-19T21:29:24Z

@OriolAbril I tried this, but it didn't work, because the code for PyMC3Converter. _extract_log_likelihood() does not populate this group if there is more than one observed RV

This is why I recommended using the code in #794

rpgoldman · 2020-01-20T03:06:47Z

@OriolAbril Thanks for the reminder. I will have a careful look at #794 tomorrow.

VincentBt · 2020-02-05T18:46:09Z

so the fastest way to get it done is probably to group all the log likelihoods into a single "log_likelihood" variable,

What do you suggest once you sent this concatenation of log_likelihood for all observed variables, to the waic function? Don't you need to sum the log_likelihoods at some point? See my comment here.

OriolAbril · 2020-02-05T19:02:54Z

After combining all the log_likelihood data into a single array (stored in sample_stats.log_likelihood), waic and loo will calculate the IC assuming all observations are conditionally independent (or independent, I am not completely sure, I have not found time to do the math), waic and loo already worked with n-dimensional arrays. If this is your case, then the results will be correct, otherwise you would have to implement the correct version of the algorithm or wait a little still.

Note: ArviZ currently looks first at sample_stats to raise a warning if log likelihood is still there, therefore, doing this will only get a deprecation warning and not the annoying "Found several log likelihood arrays {}, var_name cannot be None" error.

rpgoldman changed the title ~~Massage InferenceData for multi-variable model comparison~~ Question: Massage InferenceData for multi-variable model comparison? Jan 14, 2020

rpgoldman mentioned this issue Jan 14, 2020

WAIC/LOO for models with multiple observed variables #987

Closed

OriolAbril mentioned this issue May 4, 2020

[proposal] support for IC with multiple variables #1173

Merged

5 tasks

OriolAbril mentioned this issue Jun 7, 2020

Model comparison missing section arviz-devs/Exploratory-Analysis-of-Bayesian-Models#14

Closed

OriolAbril closed this as completed Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Massage InferenceData for multi-variable model comparison? #998

Question: Massage InferenceData for multi-variable model comparison? #998

rpgoldman commented Jan 14, 2020

OriolAbril commented Jan 14, 2020

ahartikainen commented Jan 14, 2020

rpgoldman commented Jan 14, 2020

rpgoldman commented Jan 19, 2020

OriolAbril commented Jan 19, 2020

rpgoldman commented Jan 20, 2020

VincentBt commented Feb 5, 2020

OriolAbril commented Feb 5, 2020

Question: Massage InferenceData for multi-variable model comparison? #998

Question: Massage InferenceData for multi-variable model comparison? #998

Comments

rpgoldman commented Jan 14, 2020

Short Description

OriolAbril commented Jan 14, 2020

ahartikainen commented Jan 14, 2020

rpgoldman commented Jan 14, 2020

rpgoldman commented Jan 19, 2020

OriolAbril commented Jan 19, 2020

rpgoldman commented Jan 20, 2020

VincentBt commented Feb 5, 2020

OriolAbril commented Feb 5, 2020