Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error comparing PyMC3 models with multiple observed variables #1614

Closed
rpgoldman opened this issue Mar 15, 2021 · 3 comments · Fixed by #1616
Closed

Error comparing PyMC3 models with multiple observed variables #1614

rpgoldman opened this issue Mar 15, 2021 · 3 comments · Fixed by #1616

Comments

@rpgoldman
Copy link
Contributor

Describe the bug
I have a set of PyMC3 models with multiple observed variables that I would like to compare with ArviZ. When I try to invoke the comparison as follows:

arviz.compare({"simple_model": simple_model_idata, "complex_model": complex_model_idata})

I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-ae768c0c0627> in <module>
----> 1 arviz.compare({"simple_model": simple_model_idata, "complex_model": complex_model_idata})

~/tacc-work/jupyter_packages/envs/xplan-experiment-analysis/lib/python3.9/site-packages/arviz/stats/stats.py in compare(dataset_dict, ic, method, b_samples, alpha, seed, scale)
    210     for name, dataset in dataset_dict.items():
    211         names.append(name)
--> 212         ics = ics.append([ic_func(dataset, pointwise=True, scale=scale)])
    213     ics.index = names
    214     ics.sort_values(by=ic, inplace=True, ascending=ascending)

~/tacc-work/jupyter_packages/envs/xplan-experiment-analysis/lib/python3.9/site-packages/arviz/stats/stats.py in loo(data, pointwise, var_name, reff, scale)
    616     """
    617     inference_data = convert_to_inference_data(data)
--> 618     log_likelihood = _get_log_likelihood(inference_data, var_name=var_name)
    619     pointwise = rcParams["stats.ic_pointwise"] if pointwise is None else pointwise
    620 

~/tacc-work/jupyter_packages/envs/xplan-experiment-analysis/lib/python3.9/site-packages/arviz/stats/stats_utils.py in get_log_likelihood(idata, var_name)
    424         var_names = list(idata.log_likelihood.data_vars)
    425         if len(var_names) > 1:
--> 426             raise TypeError(
    427                 f"Found several log likelihood arrays {var_names}, var_name cannot be None"
    428             )

TypeError: Found several log likelihood arrays ['AND obs', 'NAND obs', 'NOR obs', 'OR obs', 'XNOR obs', 'XOR obs'], var_name cannot be None

This is unfortunate for at least two reasons:

  1. I'd like to be able to compare these models.
  2. The error message is unhelpful -- there is no var_name argument for compare. IIRC, the real error message should be that ArviZ can only compare models with a single output variable, which could be checked at the compare interface, instead of there being an error in get_log_likelihood

Request for Help

There's a usage question hidden in here: the different observed variables in the model ['AND obs', 'NAND obs', 'NOR obs', 'OR obs', 'XNOR obs', 'XOR obs'] are essentially the same variable, but with six different combinations of parameters upstream.

Instead of combining all these variables into one observation variable with a complex structure of selectors to turn parameters on and off, they are separated into subsets.

So there's really a "meta variable" that is the concatenation of these six variables. IIUC, the log-likelihood of the full model should be the sum of the log-likelihood of each of these variables (each variable is conditionally independent of the others given the parameters).

If there's a way to massage the model and/or the InferenceData to reflect this, so that the IC can be evaluated, please let me know!

To Reproduce
I don't have a small case that replicates this behavior yet.

Expected behavior
Either a successful comparison or an error message that reflects the API. IIRC, this requires that there be only a single output variable -- maybe we could check the argument input data for this, and raise an error if there's more than one.

Additional context

Arviz == 0.11.2
PyMC3 == 3.11.1
Theano-PyMC == 1.1.2

@OriolAbril
Copy link
Member

Luckily, this is actually only a matter of me messing up and not exposing var_name in compare 😄

ArviZ does already support model comparison when there are multiple variables stored in the log likelihood group, hence the error, it simply can't be done automatically, it needs the var_name as extra user input. Take a look at https://nbviewer.jupyter.org/github/OriolAbril/Exploratory-Analysis-of-Bayesian-Models/blob/multi_obs_ic/content/Section_04/Multiple_likelihoods.ipynb for example usage (uses loo instead of compare that already has the var_name argument exposed).

@rpgoldman
Copy link
Contributor Author

@OriolAbril Thanks for that explanation. I will see about making an MR for this.

BTW, do you know if I'm correct in my conjecture that for this special case (where the variables in question are essentially sub-ranges of a single vector), I could simply add together the individual log-likelihood terms?

@OriolAbril
Copy link
Member

I don't have enough information about the model to see if they should be added or concatenated, it does sound like adding makes more sense but I am not sure. The notebook I linked above however does cover this exact same case with the rugby data, and it also has some cool diagrams. After reading that you should not have doubts about which is the one that makes sense for your model and question, if it were not clear enough, let me know and we'll see how to improve the notebook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants