WAIC/LOO for models with multiple observed variables #987

VincentBt · 2020-01-06T15:13:27Z

(Related issue: #794 - cc @rpgoldman)

First, thank you for this great library!

I was told by @ahartikainen here that multiple observations were not supported in arviz (at least for numpyro and pyro, but I guess for other libraries like PyMC3 as well).

My question is: to go from 1 observation to multiple observations $y^1, \dots, y^m$ , isn't it enough to sum all the log_likelihoods of observed variables to compute the waic and loo?

Indeed, the WAIC and LOO are computed based on $p(y_i | \theta_s)$ where are all the examples (observations) and $\theta_s$ are all the samples from the posterior of all parameters and hidden variables.

In the case where $y_i^1, \dots, y_i^m$ are conditionally independent given all the hidden variables and parameters, then we can use $\log ( p(y_i | \theta_s)) = \sum_{k} \log ( p(y_i^k | \theta_s) )$ and use this quantity as the likelihood of our observation. (Correct me if I'm mistaken)
When there are some dependencies between the observed variables, then it seems to me that it still holds - maybe except in the case where there are circular dependencies between variables.

Below I write examples of numpyro models:

for one observation:

def eight_schools(J, sigma, y=None):
    mu = numpyro.sample('mu', dist.Normal(0, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    with numpyro.plate('J', J):
        theta = numpyro.sample('theta', dist.Normal(mu, tau))
        numpyro.sample('obs', dist.Normal(theta, sigma), obs=y)

for several observations which are conditionally independent:

def eight_schools(J, sigma, y1=None, y2=None):
    mu = numpyro.sample('mu', dist.Normal(0, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    with numpyro.plate('J', J):
        theta = numpyro.sample('theta', dist.Normal(mu, tau))
        numpyro.sample('obs1', dist.Normal(theta, sigma), obs=y1)
        numpyro.sample('obs2', dist.Normal(theta, sigma), obs=y2)

for several observations which are not conditionally independent:

def eight_schools(J, sigma, y1=None, y2=None):
    mu = numpyro.sample('mu', dist.Normal(0, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    with numpyro.plate('J', J):
        theta = numpyro.sample('theta', dist.Normal(mu, tau))
        obs1 = numpyro.sample('obs1', dist.Normal(theta, sigma), obs=y1)
        numpyro.sample('obs2', dist.Normal(obs1, sigma), obs=y2)

The text was updated successfully, but these errors were encountered:

* update to numpyro 0.2.1 * update cookbook * fix lint erros * make the test more throughout * no need for divergin field * cleanup notebook file

avehtari · 2020-01-06T16:23:09Z

Some useful material answering this and related questions

https://statmodeling.stat.columbia.edu/2018/08/03/loo-cross-validation-approaches-valid/
https://avehtari.github.io/modelselection/rats_kcv.html
https://arxiv.org/abs/1802.04452
https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=d7849131-0afd-4ae6-ad64-aafb00da36f4
https://aalto.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=50b2e73f-af0a-4715-b627-ab0200ca7bbd

rpgoldman · 2020-01-06T17:08:50Z

I'd very much like to help get this fixed, if possible, since I have models with multiple observation variables. I'm working in PyMC3, not pyro.

Am I correct in thinking that for multiple observations, we need to find the log likelihood of observations that are tuples of individual observed RVs? I don't believe that this operation would be supported natively by PyMC3, but it should be able to add it.

Maybe someone with a better understanding of the underlying mathematics could lay out a plan for this?

avehtari · 2020-01-06T17:16:18Z

I'd very much like to help get this fixed,

I did not get from your description whether you want to leave out $y_i^k$ or $y_i^1, \dots, y_i^m$?

If latter, then note that if each $i$ has it's own parameter WAIC and PSIS are likely to fail and adding marginalization as in https://arxiv.org/abs/1802.04452 would be needed but is more difficult to implement.

VincentBt · 2020-01-06T18:17:58Z

Am I correct in thinking that for multiple observations, we need to find the log likelihood of observations that are tuples of individual observed RVs?

@rpgoldman Indeed:
$lppd(y,\theta) = \sum_i \log(\frac{1}{S} \sum_{\theta_s} p(y_i|\theta_s)) = \sum_i \log(\frac{1}{S} \sum_{\theta_s} p(y_i|\theta_s))$ where $y_i = (y_i^1, \dots, y_i^m)$ .
What I propose to compute this quantity is to add over k all the $\log ( p(y_i^k | \theta_s) )$ , which are computed by PyMC3, pyro, etc. But I'm not sure that it is correct in all cases (see my question above).

@avehtari Thank you for the links. Are you saying that WAIC and PSIS often fail in the case where each datapoint $i$ has several observed variables $y_i^1, \dots, y_i^m$? I'm not sure to understand.

avehtari · 2020-01-06T18:30:01Z

Are you saying that WAIC and PSIS often fail in the case where each datapoint $i$ has several observed variables $y_i^1, \dots, y_i^m$? I'm not sure to understand.

It's bit difficult that you are using different words than what is common in statistics. For me $y_i^k$ is one datapoint and one observations, $i$ is indexing group of observations, and each group has $m$ observations. You did not say anything about the model you are going to use, but this kind of data is often modelled with a model having group specific parameter(s) for each i. So it's not the number of observations per $i$, but whether you have $i$ specific parameter as in that case, if you remove all observations in group $i$, WAIC and PSIS are likely to fail. Additional case study demonstrating this is https://avehtari.github.io/modelselection/roaches.html (see random effect model). In this case LOO can still work if the computation is, for example, exact. It's not yet clear for me whether you want in my terms leave-one-observation-out or leave-one.group-out. See the material in links for more information why you might want one or the other.

VincentBt · 2020-01-07T17:51:57Z

Here is an example of a simple model. For an input $x_i$ and parameters sigma, beta1, and beta2. There are two observations $y1_i$ and $y2_i$ for each input $x_i$. The sigmoid is just here for the Bernoulli probability to be between 0 and 1.
This model is not dealt with by arviz (because of y1 and y2 instead of simply y1).

hidden_i ~ Normal(x_i, sigma)
y1_i ~ Bernoulli(sigmoid(beta1 * hidden_i))
y2_i ~ Bernoulli(sigmoid(beta2 * hidden_i))

To compute WAIC/PSIS-LOO for this model, we need all the $p(y_i|\theta_s))$ , whereas PyMC3, Pyro and others only provide with the log_likelihoods $p(y1_i|\theta_s))$ and $p(y2_i|\theta_s))$ .

My question is: shouldn't we simply compute the products of the likelihoods, or equivalently sum all the log_likelihoods? Indeed, $p(y_i | \theta_s) = p(y1_i | \theta_s) p(y2_i | \theta_s)$

If the answer is yes, isn't it something which can be expanded to more complicated models with y1, ..., yN, as long as y1, ...yN are independent conditionally on the $theta_s$ (as it is the case in the simple model above)?

If the answer is still yes, can it be expanded to models where y1, ..., yN are not independent conditionally on the samples $theta_s$? An example of such a model is:

hidden_i ~ Normal(x_i, sigma)
y1_i ~ Bernoulli(sigmoid(beta1 * hidden_i))
y2_i ~ Bernoulli(sigmoid(beta2 * y1_i))

@avehtari From what I saw in your lectures, I think it means we do leave-one-observation-out but the observation is a vector $y_i$.

I hope it makes things clearer.

avehtari · 2020-01-09T15:54:43Z

It's more clear now.

To compute WAIC/PSIS-LOO for this model, we need all the $$p(y_i|\theta_s))$$, whereas PyMC3, Pyro and others only provide with the log_likelihoods $$p(y1_i|\theta_s))$$ and $$p(y2_i|\theta_s))$$.

WAIC/PSIS-LOO can be computed with either $$p(y_i|\theta_s))$$ or $$p(y1_i|\theta_s))$$ and $$p(y2_i|\theta_s))$$. They just have different focus.

My question is: shouldn't we simply compute the products of the likelihoods, or equivalently sum all the log_likelihoods? Indeed, $$p(y_i | \theta_s) = p(y1_i | \theta_s) p(y2_i | \theta_s)$$

Correct, if you want to focus on $$p(y_i | \theta_s)$$

If the answer is yes, isn't it something which can be expanded to more complicated models with y1, ..., yN, as long as y1, ...yN are independent conditionally on the $theta_s$ (as it is the case in the simple model above)?

Yes. There can be computational problem if you have hidden_i parameter and you remove all observations related to $$i$$.

If the answer is still yes, can it be expanded to models where y1, ..., yN are not independent conditionally on the samples $theta_s$? An example of such a model is:

Yes. Depending on the focus you may consider leave-future-out as in time series (see, e.g. https://arxiv.org/abs/1902.06281)

ahartikainen · 2020-01-09T16:01:43Z

I think we should have multiple options for different methods. LFO needs also possibility to resample (there are code to do this in pystan, but we really need to refine InferenceData class to enable this).

We should have same default as in loo2.

VincentBt · 2020-01-09T17:58:08Z

There can be computational problem if you have hidden_i parameter and you remove all observations related to $$i$$.

@avehtari Why? Would the computational problem appear during the computation of the WAIC/PSIS-LOO based on $p(y_i | \theta_s)$ , or before in the computation of $p(y_i | \theta_s) = \prod_{k} p(y_i^k | \theta_s)$ ?

VincentBt · 2020-01-09T18:04:48Z

Anyway, all this means that in the case of multiple observation sites, then we can compute WAIC or PSIS-LOO by initially summing the log_likelihood of all observation sites, as proposed here. In my opinion, it should be the default, rather than having a WAIC/PSIS-LOO for each observation site.

rpgoldman · 2020-01-14T16:45:33Z

#998 is a simple sub-problem here: trying to figure out if I can work around this limitation for the case of conditionally independent subsets of observations.

OriolAbril · 2020-04-15T19:27:36Z

I wrote one example on handling multiple observed data in PyMC discourse.

https://discourse.pymc.io/t/calculating-waic-for-models-with-multiple-likelihood-functions/4834/5

It is built on top of a PyMC3 model, but the main focus is on ArviZ usage and how to handle the datasets in log_likelihood group to get a single array, taking into account different possibilities. Thus, it aims to help align the focus of the loo/waic calculation, and not on choosing said focus (which is a task that should be done on a model basis and will probably not always have the same answer).

avehtari · 2020-04-17T17:16:38Z

@avehtari Why?

Somehow I had missed this question, but got email from the last comment. PSIS-LOO and WAIC fail if the posterior changes a lot. If we remove all observations directly related to some parameter then the posterior of that parameter is the same as the prior which is likely to much different to the posterior.

ferrine · 2021-03-26T09:55:27Z

az.loo is used in many places and sometimes raises an exception in those functions. For example: az.compare does not work with multiple observed variables because it is not possible to specify additional arguments for ic_func

OriolAbril · 2021-03-26T12:24:14Z

#1616 will take care of some immediate fixes, the end goal however should be to allow computing loo outside compare and pass the result to compare instead of being forced to pass inferencedata to it

hcp4715 · 2021-06-26T10:15:07Z

Update after 10 days:
I found that the point-wise log likelihood generated from pymc3 has only three dimensions: chain, draw, and observation (e.g., rt_dim_0), but I put too much information into the dimensions. After remove other information, e.g., subject index, index related to experimental design, it worked! See here for my update: https://groups.google.com/g/hddm-users/c/cO9EBdRAvzs/m/DBg_n4s7BAAJ. I will keep the original issue in case that it is useful to others.

Dear all,

This conversation is very relevant to the problem I am trying to solve, so I joined in and seek some help.

I am trying to convert HDDM model data into arivz Inference data so that we can use arviz's functions. HDDM is based on pymc 2.3.8. I tried to do this because HDDM is widely used in our field and the attempt to use pymc3 to do the same thing haven't finished.

HDDM is for behavioral experiments in psychology, where human subjects/participants are asked to response to stimuli appeared on computer screen by pressing (one of two) buttons. We call each presentation of a stimulus (and participants' response) as a trial. The button pressed and the reaction time, the time interval between the onset of stimulus presentation and the button-press, are recorded as outcome. Usually, the stimuli presented on the screen are varied based on researchers' design. For example, some stimulus are blurred, some are clear.

For each condition, we usually have multiple trials, but the number of trials may not be the same for different conditions, which means that we may have different number of observed data points for different conditions. Worse, because participants may miss some trials, these trials are usually removed from analysis, so that different participants may also have different observed data point under that same condition. For example, in the data I am working with, the data looks like below:

if we look at the number of trials of each condition for each participant, it looks like this:

As we can see above, for the condition LL of subject 0, there are 73 trials, the number is 151 for the WL condition of subject 0, but condition LL of subject 1 is 74.

When convert data and point-wise log likelihood to InferenceData, I set the trial as a dimension, which then introduced a lot nan in the data. These nans cause nan result from az.loo or az.waic.

I've read this very valuable post: https://discourse.pymc.io/t/calculating-waic-for-models-with-multiple-likelihood-functions/4834/5 but have no idea how to translate the idea in that post to our situation. I an new to arviz, also relatively new to Bayesian stats, but I really hope to bridge the HDDM and arviz so that our community and have a better tool for DDM. Any suggestions are appreciated.

Thanks in advance.

PS: if I trim the data so that each participant and each condition has same number of trials, then it works: #1733.

FrancescPoli · 2021-07-15T11:41:53Z

@hcp4715 I have a similar issue as the one you report, have you by any chance made progress on this? It would be super helpful!

hcp4715 · 2021-07-16T00:17:06Z

Hi, @FrancescPoli ,

I've updated my original post.

Briefly, when converting dataframe to Xarray, Xarray will auto fill the rows with nan so that all dimensions have the same length. In PyMC3, the point-wise log likelihood has only three dimensions: chain, draw, and a dimension that has the same length as observations (e.g., rt_dim_0), therefore, no nan will be generated.

Meaning, the nan problem in my original post was caused by the fact that I added too many dimensions. After removing other dimensions, e.g., subject index, index related to experimental design, and leaving only three dimensions (chain, draw, and rt), it worked!

See here for my update: https://groups.google.com/g/hddm-users/c/cO9EBdRAvzs/m/DBg_n4s7BAAJ.

OriolAbril · 2021-11-09T00:42:48Z

Closing this as the original issue has been fixed. For usage and interpretation questions on loo/waic, please ask on stan or pymc discourse forums.

VincentBt referenced this issue Jan 6, 2020

Fully support for numpyro (#852)

a45e901

* update to numpyro 0.2.1 * update cookbook * fix lint erros * make the test more throughout * no need for divergin field * cleanup notebook file

VincentBt mentioned this issue Feb 5, 2020

Add more data to InferenceData objects [GSOC] #794

Merged

OriolAbril mentioned this issue May 4, 2020

[proposal] support for IC with multiple variables #1173

Merged

5 tasks

OriolAbril mentioned this issue Jun 7, 2020

Model comparison missing section arviz-devs/Exploratory-Analysis-of-Bayesian-Models#14

Closed

OriolAbril closed this as completed Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WAIC/LOO for models with multiple observed variables #987

WAIC/LOO for models with multiple observed variables #987

VincentBt commented Jan 6, 2020

avehtari commented Jan 6, 2020

rpgoldman commented Jan 6, 2020

avehtari commented Jan 6, 2020

VincentBt commented Jan 6, 2020

avehtari commented Jan 6, 2020

VincentBt commented Jan 7, 2020

avehtari commented Jan 9, 2020

ahartikainen commented Jan 9, 2020

VincentBt commented Jan 9, 2020 •

edited

Loading

VincentBt commented Jan 9, 2020

rpgoldman commented Jan 14, 2020 •

edited

Loading

OriolAbril commented Apr 15, 2020 •

edited

Loading

avehtari commented Apr 17, 2020 •

edited by ahartikainen

Loading

ferrine commented Mar 26, 2021 •

edited

Loading

OriolAbril commented Mar 26, 2021

hcp4715 commented Jun 26, 2021 •

edited

Loading

FrancescPoli commented Jul 15, 2021

hcp4715 commented Jul 16, 2021

OriolAbril commented Nov 9, 2021

WAIC/LOO for models with multiple observed variables #987

WAIC/LOO for models with multiple observed variables #987

Comments

VincentBt commented Jan 6, 2020

avehtari commented Jan 6, 2020

rpgoldman commented Jan 6, 2020

avehtari commented Jan 6, 2020

VincentBt commented Jan 6, 2020

avehtari commented Jan 6, 2020

VincentBt commented Jan 7, 2020

avehtari commented Jan 9, 2020

ahartikainen commented Jan 9, 2020

VincentBt commented Jan 9, 2020 • edited Loading

VincentBt commented Jan 9, 2020

rpgoldman commented Jan 14, 2020 • edited Loading

OriolAbril commented Apr 15, 2020 • edited Loading

avehtari commented Apr 17, 2020 • edited by ahartikainen Loading

ferrine commented Mar 26, 2021 • edited Loading

OriolAbril commented Mar 26, 2021

hcp4715 commented Jun 26, 2021 • edited Loading

FrancescPoli commented Jul 15, 2021

hcp4715 commented Jul 16, 2021

OriolAbril commented Nov 9, 2021

VincentBt commented Jan 9, 2020 •

edited

Loading

rpgoldman commented Jan 14, 2020 •

edited

Loading

OriolAbril commented Apr 15, 2020 •

edited

Loading

avehtari commented Apr 17, 2020 •

edited by ahartikainen

Loading

ferrine commented Mar 26, 2021 •

edited

Loading

hcp4715 commented Jun 26, 2021 •

edited

Loading