Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAIC/LOO for models with multiple observed variables #987

Closed
VincentBt opened this issue Jan 6, 2020 · 19 comments
Closed

WAIC/LOO for models with multiple observed variables #987

VincentBt opened this issue Jan 6, 2020 · 19 comments

Comments

@VincentBt
Copy link
Contributor

(Related issue: #794 - cc @rpgoldman)

First, thank you for this great library!

I was told by @ahartikainen here that multiple observations were not supported in arviz (at least for numpyro and pyro, but I guess for other libraries like PyMC3 as well).

My question is: to go from 1 observation to multiple observations , isn't it enough to sum all the log_likelihoods of observed variables to compute the waic and loo?

Indeed, the WAIC and LOO are computed based on where are all the examples (observations) and are all the samples from the posterior of all parameters and hidden variables.

  • In the case where are conditionally independent given all the hidden variables and parameters, then we can use and use this quantity as the likelihood of our observation. (Correct me if I'm mistaken)
  • When there are some dependencies between the observed variables, then it seems to me that it still holds - maybe except in the case where there are circular dependencies between variables.

Below I write examples of numpyro models:

for one observation:

def eight_schools(J, sigma, y=None):
    mu = numpyro.sample('mu', dist.Normal(0, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    with numpyro.plate('J', J):
        theta = numpyro.sample('theta', dist.Normal(mu, tau))
        numpyro.sample('obs', dist.Normal(theta, sigma), obs=y)

for several observations which are conditionally independent:

def eight_schools(J, sigma, y1=None, y2=None):
    mu = numpyro.sample('mu', dist.Normal(0, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    with numpyro.plate('J', J):
        theta = numpyro.sample('theta', dist.Normal(mu, tau))
        numpyro.sample('obs1', dist.Normal(theta, sigma), obs=y1)
        numpyro.sample('obs2', dist.Normal(theta, sigma), obs=y2)

for several observations which are not conditionally independent:

def eight_schools(J, sigma, y1=None, y2=None):
    mu = numpyro.sample('mu', dist.Normal(0, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    with numpyro.plate('J', J):
        theta = numpyro.sample('theta', dist.Normal(mu, tau))
        obs1 = numpyro.sample('obs1', dist.Normal(theta, sigma), obs=y1)
        numpyro.sample('obs2', dist.Normal(obs1, sigma), obs=y2)
VincentBt referenced this issue Jan 6, 2020
* update to numpyro 0.2.1

* update cookbook

* fix lint erros

* make the test more throughout

* no need for divergin field

* cleanup notebook file
@rpgoldman
Copy link
Contributor

I'd very much like to help get this fixed, if possible, since I have models with multiple observation variables. I'm working in PyMC3, not pyro.

Am I correct in thinking that for multiple observations, we need to find the log likelihood of observations that are tuples of individual observed RVs? I don't believe that this operation would be supported natively by PyMC3, but it should be able to add it.

Maybe someone with a better understanding of the underlying mathematics could lay out a plan for this?

@avehtari
Copy link
Contributor

avehtari commented Jan 6, 2020

I'd very much like to help get this fixed,

I did not get from your description whether you want to leave out $y_i^k$ or $y_i^1, \dots, y_i^m$?

If latter, then note that if each $i$ has it's own parameter WAIC and PSIS are likely to fail and adding marginalization as in https://arxiv.org/abs/1802.04452 would be needed but is more difficult to implement.

@VincentBt
Copy link
Contributor Author

Am I correct in thinking that for multiple observations, we need to find the log likelihood of observations that are tuples of individual observed RVs?

@rpgoldman Indeed:
where .
What I propose to compute this quantity is to add over k all the , which are computed by PyMC3, pyro, etc. But I'm not sure that it is correct in all cases (see my question above).

@avehtari Thank you for the links. Are you saying that WAIC and PSIS often fail in the case where each datapoint $i$ has several observed variables $y_i^1, \dots, y_i^m$? I'm not sure to understand.

@avehtari
Copy link
Contributor

avehtari commented Jan 6, 2020

Are you saying that WAIC and PSIS often fail in the case where each datapoint $i$ has several observed variables $y_i^1, \dots, y_i^m$? I'm not sure to understand.

It's bit difficult that you are using different words than what is common in statistics. For me $y_i^k$ is one datapoint and one observations, $i$ is indexing group of observations, and each group has $m$ observations. You did not say anything about the model you are going to use, but this kind of data is often modelled with a model having group specific parameter(s) for each i. So it's not the number of observations per $i$, but whether you have $i$ specific parameter as in that case, if you remove all observations in group $i$, WAIC and PSIS are likely to fail. Additional case study demonstrating this is https://avehtari.github.io/modelselection/roaches.html (see random effect model). In this case LOO can still work if the computation is, for example, exact. It's not yet clear for me whether you want in my terms leave-one-observation-out or leave-one.group-out. See the material in links for more information why you might want one or the other.

@VincentBt
Copy link
Contributor Author

Here is an example of a simple model. For an input $x_i$ and parameters sigma, beta1, and beta2. There are two observations $y1_i$ and $y2_i$ for each input $x_i$. The sigmoid is just here for the Bernoulli probability to be between 0 and 1.
This model is not dealt with by arviz (because of y1 and y2 instead of simply y1).

hidden_i ~ Normal(x_i, sigma)
y1_i ~ Bernoulli(sigmoid(beta1 * hidden_i))
y2_i ~ Bernoulli(sigmoid(beta2 * hidden_i))

To compute WAIC/PSIS-LOO for this model, we need all the , whereas PyMC3, Pyro and others only provide with the log_likelihoods and .

My question is: shouldn't we simply compute the products of the likelihoods, or equivalently sum all the log_likelihoods? Indeed,

If the answer is yes, isn't it something which can be expanded to more complicated models with y1, ..., yN, as long as y1, ...yN are independent conditionally on the $theta_s$ (as it is the case in the simple model above)?

If the answer is still yes, can it be expanded to models where y1, ..., yN are not independent conditionally on the samples $theta_s$? An example of such a model is:

hidden_i ~ Normal(x_i, sigma)
y1_i ~ Bernoulli(sigmoid(beta1 * hidden_i))
y2_i ~ Bernoulli(sigmoid(beta2 * y1_i))

@avehtari From what I saw in your lectures, I think it means we do leave-one-observation-out but the observation is a vector $y_i$.

I hope it makes things clearer.

@avehtari
Copy link
Contributor

avehtari commented Jan 9, 2020

It's more clear now.

To compute WAIC/PSIS-LOO for this model, we need all the $$p(y_i|\theta_s))$$, whereas PyMC3, Pyro and others only provide with the log_likelihoods $$p(y1_i|\theta_s))$$ and $$p(y2_i|\theta_s))$$.

WAIC/PSIS-LOO can be computed with either $$p(y_i|\theta_s))$$ or $$p(y1_i|\theta_s))$$ and $$p(y2_i|\theta_s))$$. They just have different focus.

My question is: shouldn't we simply compute the products of the likelihoods, or equivalently sum all the log_likelihoods? Indeed, $$p(y_i | \theta_s) = p(y1_i | \theta_s) p(y2_i | \theta_s)$$

Correct, if you want to focus on $$p(y_i | \theta_s)$$

If the answer is yes, isn't it something which can be expanded to more complicated models with y1, ..., yN, as long as y1, ...yN are independent conditionally on the $theta_s$ (as it is the case in the simple model above)?

Yes. There can be computational problem if you have hidden_i parameter and you remove all observations related to $$i$$.

If the answer is still yes, can it be expanded to models where y1, ..., yN are not independent conditionally on the samples $theta_s$? An example of such a model is:

Yes. Depending on the focus you may consider leave-future-out as in time series (see, e.g. https://arxiv.org/abs/1902.06281)

@ahartikainen
Copy link
Contributor

I think we should have multiple options for different methods. LFO needs also possibility to resample (there are code to do this in pystan, but we really need to refine InferenceData class to enable this).

We should have same default as in loo2.

@VincentBt
Copy link
Contributor Author

VincentBt commented Jan 9, 2020

There can be computational problem if you have hidden_i parameter and you remove all observations related to $$i$$.

@avehtari Why? Would the computational problem appear during the computation of the WAIC/PSIS-LOO based on , or before in the computation of ?

@VincentBt
Copy link
Contributor Author

Anyway, all this means that in the case of multiple observation sites, then we can compute WAIC or PSIS-LOO by initially summing the log_likelihood of all observation sites, as proposed here. In my opinion, it should be the default, rather than having a WAIC/PSIS-LOO for each observation site.

@rpgoldman
Copy link
Contributor

rpgoldman commented Jan 14, 2020

#998 is a simple sub-problem here: trying to figure out if I can work around this limitation for the case of conditionally independent subsets of observations.

@OriolAbril
Copy link
Member

OriolAbril commented Apr 15, 2020

I wrote one example on handling multiple observed data in PyMC discourse.

https://discourse.pymc.io/t/calculating-waic-for-models-with-multiple-likelihood-functions/4834/5

It is built on top of a PyMC3 model, but the main focus is on ArviZ usage and how to handle the datasets in log_likelihood group to get a single array, taking into account different possibilities. Thus, it aims to help align the focus of the loo/waic calculation, and not on choosing said focus (which is a task that should be done on a model basis and will probably not always have the same answer).

@avehtari
Copy link
Contributor

avehtari commented Apr 17, 2020

@avehtari Why?

Somehow I had missed this question, but got email from the last comment. PSIS-LOO and WAIC fail if the posterior changes a lot. If we remove all observations directly related to some parameter then the posterior of that parameter is the same as the prior which is likely to much different to the posterior.

@ferrine
Copy link

ferrine commented Mar 26, 2021

az.loo is used in many places and sometimes raises an exception in those functions. For example: az.compare does not work with multiple observed variables because it is not possible to specify additional arguments for ic_func

@OriolAbril
Copy link
Member

#1616 will take care of some immediate fixes, the end goal however should be to allow computing loo outside compare and pass the result to compare instead of being forced to pass inferencedata to it

@hcp4715
Copy link

hcp4715 commented Jun 26, 2021

Update after 10 days:
I found that the point-wise log likelihood generated from pymc3 has only three dimensions: chain, draw, and observation (e.g., rt_dim_0), but I put too much information into the dimensions. After remove other information, e.g., subject index, index related to experimental design, it worked! See here for my update: https://groups.google.com/g/hddm-users/c/cO9EBdRAvzs/m/DBg_n4s7BAAJ. I will keep the original issue in case that it is useful to others.

Dear all,

This conversation is very relevant to the problem I am trying to solve, so I joined in and seek some help.

I am trying to convert HDDM model data into arivz Inference data so that we can use arviz's functions. HDDM is based on pymc 2.3.8. I tried to do this because HDDM is widely used in our field and the attempt to use pymc3 to do the same thing haven't finished.

HDDM is for behavioral experiments in psychology, where human subjects/participants are asked to response to stimuli appeared on computer screen by pressing (one of two) buttons. We call each presentation of a stimulus (and participants' response) as a trial. The button pressed and the reaction time, the time interval between the onset of stimulus presentation and the button-press, are recorded as outcome. Usually, the stimuli presented on the screen are varied based on researchers' design. For example, some stimulus are blurred, some are clear.

For each condition, we usually have multiple trials, but the number of trials may not be the same for different conditions, which means that we may have different number of observed data points for different conditions. Worse, because participants may miss some trials, these trials are usually removed from analysis, so that different participants may also have different observed data point under that same condition. For example, in the data I am working with, the data looks like below:

Screenshot from 2021-06-26 17-51-15

if we look at the number of trials of each condition for each participant, it looks like this:
Screenshot from 2021-06-26 17-53-52

As we can see above, for the condition LL of subject 0, there are 73 trials, the number is 151 for the WL condition of subject 0, but condition LL of subject 1 is 74.

When convert data and point-wise log likelihood to InferenceData, I set the trial as a dimension, which then introduced a lot nan in the data. These nans cause nan result from az.loo or az.waic.

Screenshot from 2021-06-26 17-58-03

I've read this very valuable post: https://discourse.pymc.io/t/calculating-waic-for-models-with-multiple-likelihood-functions/4834/5 but have no idea how to translate the idea in that post to our situation. I an new to arviz, also relatively new to Bayesian stats, but I really hope to bridge the HDDM and arviz so that our community and have a better tool for DDM. Any suggestions are appreciated.

Thanks in advance.

PS: if I trim the data so that each participant and each condition has same number of trials, then it works: #1733.

@FrancescPoli
Copy link

@hcp4715 I have a similar issue as the one you report, have you by any chance made progress on this? It would be super helpful!

@hcp4715
Copy link

hcp4715 commented Jul 16, 2021

Hi, @FrancescPoli ,

I've updated my original post.

Briefly, when converting dataframe to Xarray, Xarray will auto fill the rows with nan so that all dimensions have the same length. In PyMC3, the point-wise log likelihood has only three dimensions: chain, draw, and a dimension that has the same length as observations (e.g., rt_dim_0), therefore, no nan will be generated.

Meaning, the nan problem in my original post was caused by the fact that I added too many dimensions. After removing other dimensions, e.g., subject index, index related to experimental design, and leaving only three dimensions (chain, draw, and rt), it worked!

See here for my update: https://groups.google.com/g/hddm-users/c/cO9EBdRAvzs/m/DBg_n4s7BAAJ.

@OriolAbril
Copy link
Member

Closing this as the original issue has been fixed. For usage and interpretation questions on loo/waic, please ask on stan or pymc discourse forums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants