Error comparing PyMC3 models with multiple observed variables #1614

rpgoldman · 2021-03-15T00:49:06Z

Describe the bug
I have a set of PyMC3 models with multiple observed variables that I would like to compare with ArviZ. When I try to invoke the comparison as follows:

arviz.compare({"simple_model": simple_model_idata, "complex_model": complex_model_idata})

I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-ae768c0c0627> in <module>
----> 1 arviz.compare({"simple_model": simple_model_idata, "complex_model": complex_model_idata})

~/tacc-work/jupyter_packages/envs/xplan-experiment-analysis/lib/python3.9/site-packages/arviz/stats/stats.py in compare(dataset_dict, ic, method, b_samples, alpha, seed, scale)
    210     for name, dataset in dataset_dict.items():
    211         names.append(name)
--> 212         ics = ics.append([ic_func(dataset, pointwise=True, scale=scale)])
    213     ics.index = names
    214     ics.sort_values(by=ic, inplace=True, ascending=ascending)

~/tacc-work/jupyter_packages/envs/xplan-experiment-analysis/lib/python3.9/site-packages/arviz/stats/stats.py in loo(data, pointwise, var_name, reff, scale)
    616     """
    617     inference_data = convert_to_inference_data(data)
--> 618     log_likelihood = _get_log_likelihood(inference_data, var_name=var_name)
    619     pointwise = rcParams["stats.ic_pointwise"] if pointwise is None else pointwise
    620 

~/tacc-work/jupyter_packages/envs/xplan-experiment-analysis/lib/python3.9/site-packages/arviz/stats/stats_utils.py in get_log_likelihood(idata, var_name)
    424         var_names = list(idata.log_likelihood.data_vars)
    425         if len(var_names) > 1:
--> 426             raise TypeError(
    427                 f"Found several log likelihood arrays {var_names}, var_name cannot be None"
    428             )

TypeError: Found several log likelihood arrays ['AND obs', 'NAND obs', 'NOR obs', 'OR obs', 'XNOR obs', 'XOR obs'], var_name cannot be None

This is unfortunate for at least two reasons:

I'd like to be able to compare these models.
The error message is unhelpful -- there is no var_name argument for compare. IIRC, the real error message should be that ArviZ can only compare models with a single output variable, which could be checked at the compare interface, instead of there being an error in get_log_likelihood

Request for Help

There's a usage question hidden in here: the different observed variables in the model ['AND obs', 'NAND obs', 'NOR obs', 'OR obs', 'XNOR obs', 'XOR obs'] are essentially the same variable, but with six different combinations of parameters upstream.

Instead of combining all these variables into one observation variable with a complex structure of selectors to turn parameters on and off, they are separated into subsets.

So there's really a "meta variable" that is the concatenation of these six variables. IIUC, the log-likelihood of the full model should be the sum of the log-likelihood of each of these variables (each variable is conditionally independent of the others given the parameters).

If there's a way to massage the model and/or the InferenceData to reflect this, so that the IC can be evaluated, please let me know!

To Reproduce
I don't have a small case that replicates this behavior yet.

Expected behavior
Either a successful comparison or an error message that reflects the API. IIRC, this requires that there be only a single output variable -- maybe we could check the argument input data for this, and raise an error if there's more than one.

Additional context

Arviz == 0.11.2
PyMC3 == 3.11.1
Theano-PyMC == 1.1.2

The text was updated successfully, but these errors were encountered:

OriolAbril · 2021-03-15T01:32:53Z

Luckily, this is actually only a matter of me messing up and not exposing var_name in compare 😄

ArviZ does already support model comparison when there are multiple variables stored in the log likelihood group, hence the error, it simply can't be done automatically, it needs the var_name as extra user input. Take a look at https://nbviewer.jupyter.org/github/OriolAbril/Exploratory-Analysis-of-Bayesian-Models/blob/multi_obs_ic/content/Section_04/Multiple_likelihoods.ipynb for example usage (uses loo instead of compare that already has the var_name argument exposed).

rpgoldman · 2021-03-15T15:11:43Z

@OriolAbril Thanks for that explanation. I will see about making an MR for this.

BTW, do you know if I'm correct in my conjecture that for this special case (where the variables in question are essentially sub-ranges of a single vector), I could simply add together the individual log-likelihood terms?

OriolAbril · 2021-03-15T18:36:20Z

I don't have enough information about the model to see if they should be added or concatenated, it does sound like adding makes more sense but I am not sure. The notebook I linked above however does cover this exact same case with the rugby data, and it also has some cool diagrams. After reading that you should not have doubts about which is the one that makes sense for your model and question, if it were not clear enough, let me know and we'll see how to improve the notebook

rpgoldman mentioned this issue Mar 15, 2021

Modify compare() docstring, error-check, pass var_name. #1616

Merged

4 tasks

rpgoldman linked a pull request Mar 15, 2021 that will close this issue

Modify compare() docstring, error-check, pass var_name. #1616

Merged

4 tasks

OriolAbril closed this as completed in #1616 Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error comparing PyMC3 models with multiple observed variables #1614

Error comparing PyMC3 models with multiple observed variables #1614

rpgoldman commented Mar 15, 2021

OriolAbril commented Mar 15, 2021

rpgoldman commented Mar 15, 2021

OriolAbril commented Mar 15, 2021

Error comparing PyMC3 models with multiple observed variables #1614

Error comparing PyMC3 models with multiple observed variables #1614

Comments

rpgoldman commented Mar 15, 2021

OriolAbril commented Mar 15, 2021

rpgoldman commented Mar 15, 2021

OriolAbril commented Mar 15, 2021