Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draw_values draws from marginal distributions #3210

Closed
lucianopaz opened this issue Sep 25, 2018 · 6 comments
Closed

draw_values draws from marginal distributions #3210

lucianopaz opened this issue Sep 25, 2018 · 6 comments
Assignees

Comments

@lucianopaz
Copy link
Contributor

Description of your problem

draw_values draws from the marginal distributions of the requested variables instead of the joint distribution.

Please provide a minimal, self-contained, and reproducible example.

import numpy as np
import pymc3 as pm
import seaborn
from matplotlib import pyplot as plt


with pm.Model():
    a = pm.Normal('a', mu=0, sd=100)
    b = pm.Normal('b', mu=a, sd=1e-6)
    c = pm.Normal('c', mu=a, sd=1e-6)

params = [a, b, c]
N = 10000
np.random.seed(1)
A, B, C = list(zip(*[pm.distributions.distribution.draw_values(params)
                     for i in range(N)]))
A = np.array(A)
B = np.array(B)
C = np.array(C)

np.random.seed(1)
eA = np.random.randn(N) * 100.
eB = eA + np.random.randn(N) * 1e-6
eC = eA + np.random.randn(N) * 1e-6

seaborn.jointplot(B, C)
plt.suptitle('draw_values')
plt.savefig('draw_values_output.png')
seaborn.jointplot(eB, eC)
plt.suptitle('Expected')
plt.savefig('expected_output.png')

Please provide the full traceback.
This model is a rather simple hierarchical model. The expected joint distribution of variables b and c is:

expected_output

However, the values drawn with draw_values are conditionally independent from one another

draw_values_output

The marginals match but the joint distribution does not. Given that #2983 relies heavily on draw_values to make sample_prior_predictive work, this problem is really important.

I've given this issue some though before submitting it here and I think I found the origin of the problem and also a possible solution, that I'll pull in later. What seems to be happening is

  1. draw_values sees that it must sample from a, b and c.
  2. None of these nodes are Apply nodes with ownerships so there is no dependency tree being searched and nothing happens to givens or point.
  3. draw_values arrives to its final stage where it makes three seperate calls to _draw_value, one for each variable.
  4. Inside _draw_value, its decided that param's random method must be called.
  5. random in turn issues a new call to draw_values to get the Normal's mu and sd. The values returned by this call to draw_values are never made available to the other calls in point 3. Thus, effectively, each variable ends up being drawn from its marginal distribution.

As it stands now, the only way that the nested draw_values calls from point 5 could be made aware of any value given to a is by modifying the point dictionary with the values drawn from a, before calling the conditionally dependent b and c.

My proposal is a follows. Each distribution should store a list of the theano.Variable's it is conditionally dependent on.

For example Normal should have an attribute called conditional_on which should be the list [self.mu, self.tau]. If this attribute is adopted as a convention of all distributions (in the end, every distribution already implicitly uses it when calling draw_values from their random method), we could build a dependency graph (I suppose it should be a DAG) between all the variables used by draw_values. I also imagine that this DAG could be built behind the scenes when adding random variables into a model, to save some unnecessary computations. This DAG should also take into account deterministic relations, given by theano owner input nodes, something that is already a bit in place in draw_values. Knowing the full DAG, we can know which nodes are independent from anything else, start drawing their values, putting them into the point dictionary, and them moving on to the next nodes in the hierarchy.

Versions and main components

  • PyMC3 Version: 3.4
  • Theano Version: 1.03
  • Python Version: 3.6
  • Operating system: Ubuntu 16.04
  • How did you install PyMC3: pip install -e
@junpenglao
Copy link
Member

yes i think I also had a similar issue (cannot find which issue it is now), that if you have y ~ Normal(X*b, ...) with X being a latent variable with observed, prior prediction is not conditioned correctly.

I think the current implementation is to sample the observed variable, which it necessary produce sample of all its parents as by product. When there is no observed or multiple observed things are not working well. Fixing it might be difficult in terms of finding a fast implementation (do forward pass on the DAG would surely work - we had an earlier implementation but it is much slower).

@ColCarroll
Copy link
Member

Looking at this now - a few observations:

  • The graphviz representation, which was developed at the same time with the same DAG introspection, knows about the relationship
  • I can't quickly recreate @junpenglao's note about conditioning (either by adding an observed node D, or by adding an observation to B or C). Such behavior wouldn't surprise me, though!
  • My first guess was to use prior = pm.sample_prior_predictive(samples=N), but that plot looks just the same.

image

@ColCarroll
Copy link
Member

Another note:

with pm.Model() as model:
    a = pm.Normal('a', mu=0, sd=100, observed=0)
    b = pm.Normal('b', mu=a, sd=1e-4, observed=1e-6)
    c = pm.Normal('c', mu=a, sd=1e-4, observed=1e-6)
    d = pm.Normal('d')
    trace = pm.sample()
    posterior = pm.sample_posterior_predictive(trace)

image

  • The bug does not show up in sample, which uses MCMC, but I had to play with the scales and ignore some warnings:
with pm.Model() as model:
    a = pm.Normal('a', mu=0, sd=10)
    b = pm.Normal('b', mu=a, sd=1e-2)
    c = pm.Normal('c', mu=a, sd=1e-2)
    trace = pm.sample()

image

@junpenglao
Copy link
Member

@miclegr
Copy link

miclegr commented Oct 10, 2018

I post here a related error I get when i try to sample from posterior distribution of a copula model.

Versions and main components

PyMC3 Version: tried with 3.5 and dev branch
Theano Version: 1.03
Python Version: 3.6
Operating system: Win7
How did you install PyMC3: pip install

Reproducible example

import numpy as np
import pymc3 as pm
import scipy.special as ss
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import theano.tensor as tt
from theano import shared

corr=0.6
_sigma=np.array([[1,corr],[corr,1]])
_mus=np.array([-1,2])
_sigmas=np.array([.5,.5])
N=10000

u=np.random.multivariate_normal((0,0),_sigma,size=N)
c=sp.stats.norm.cdf(u)
generated=sp.stats.lognorm.ppf(c,_sigmas,scale=np.exp(_mus))

def lognormal_cdf(values,mu,sigma):
    return pm.math.invprobit((tt.log(values)-mu)/sigma)

def jacobian_det(f_inv_x, x):
    grad = tt.reshape(pm.theanof.gradient(tt.sum(f_inv_x), [x]), x.shape)
    return tt.log(tt.abs_(grad))

with pm.Model() as model:
    
    data=shared(generated)
    
    #hackish way to create cholesky mat. that yields a covariance mat. with 1s on the diagonal and corr anywhere else
    corr = pm.LKJCorr('corr',n=2, eta=2)[0]
    chol = tt.set_subtensor(tt.eye(2,2)[([1,1], [0,1])],[corr,tt.sqrt(1-tt.pow(corr,2))]) 

#     r =  pm.Uniform('r',lower=-1, upper=1)
#     cov = pm.Deterministic('cov', 
#                            tt.stacklists([[1., r],
#                                           [r, 1.]]))
    
    mus=pm.Normal('mus',0,1,shape=2)
    sigmas=pm.HalfCauchy('sigmas',beta=2,shape=2)
    
    ps=lognormal_cdf(data,mus,sigmas)
    cs=pm.math.probit(ps)
    
    us_dist = pm.MvNormal('copula',mu=np.zeros(2), chol=chol,observed=cs)
    pm.Potential('jacob_det', jacobian_det(cs, data))

    trace=pm.sample(1000)


sample=pm.sample_posterior_predictive(trace,model=model,vars=[us_dist,mus,sigmas],samples=10000)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-14dbe60ba96f> in <module>()
----> 1 sample=pm.sample_posterior_predictive(trace,model=model,vars=[us_dist,mus,sigmas],samples=10000)
      2 
      3 simulated_c=sp.stats.norm.cdf(sample['copula'])
      4 simulated=np.exp(sp.stats.norm.ppf(simulated_c)*sample['sigmas']+sample['mus'])

~\AppData\Local\Continuum\Anaconda3\envs\newroot\lib\site-packages\pymc3\sampling.py in sample_posterior_predictive(trace, samples, model, vars, size, random_seed, progressbar)
   1118     # draw once to inspect the shape
   1119     var_values = list(zip(varnames,
-> 1120                           draw_values(vars, point=model.test_point, size=size)))
   1121     ppc_trace = defaultdict(list)
   1122     for varname, value in var_values:

~\AppData\Local\Continuum\Anaconda3\envs\newroot\lib\site-packages\pymc3\distributions\distribution.py in draw_values(params, point, size)
    310     while to_eval or missing_inputs:
    311         if to_eval == missing_inputs:
--> 312             raise ValueError('Cannot resolve inputs for {}'.format([str(params[j]) for j in to_eval]))
    313         to_eval = set(missing_inputs)
    314         missing_inputs = set()

ValueError: Cannot resolve inputs for ['copula']

related discourse thread

FYI with version 3.4 I was able to successfully sample from the copula but samples of mus and sigmas were wrong (coming from the prior i guess). With version 3.5 and dev branch samples of mus and sigmas seem to be correct but this error appeared.

lucianopaz added a commit to lucianopaz/pymc that referenced this issue Nov 27, 2018
…n PR pymc-devs#3214. It uses a context manager inside `draw_values` that makes all the values drawn from `TensorVariables` or `MultiObservedRV`s available to nested calls of the original call to `draw_values`. It is partly inspired by how Edward2 approaches the problem of forward sampling. Ed2 tensors fix a `_values` attribute after they first call `sample` and then only return that. They can do it because of their functional scheme, where the entire graph is recreated each time the generative function is called. Our object oriented paradigm cannot set a fixed _values, it has to know it is in the context of a single `draw_values` call. That is why I opted for context managers to store the drawn values.
junpenglao pushed a commit that referenced this issue Dec 3, 2018
* Fix for #3225. Made Triangular `c` attribute be handled consistently with scipy.stats. Added test and updated example code.

* Fix for #3210 which uses a completely different approach than PR #3214. It uses a context manager inside `draw_values` that makes all the values drawn from `TensorVariables` or `MultiObservedRV`s available to nested calls of the original call to `draw_values`. It is partly inspired by how Edward2 approaches the problem of forward sampling. Ed2 tensors fix a `_values` attribute after they first call `sample` and then only return that. They can do it because of their functional scheme, where the entire graph is recreated each time the generative function is called. Our object oriented paradigm cannot set a fixed _values, it has to know it is in the context of a single `draw_values` call. That is why I opted for context managers to store the drawn values.

* Removed leftover print statement

* Added release notes and draw values context managers to mixture and multivariate distributions that make many calls to draw_values or other distributions random methods within their own random.
@junpenglao
Copy link
Member

close by #3273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants