Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pm.sample_prior_predictive fails with multinomial data #3271

Closed
AlexAndorra opened this issue Nov 24, 2018 · 6 comments
Closed

pm.sample_prior_predictive fails with multinomial data #3271

AlexAndorra opened this issue Nov 24, 2018 · 6 comments

Comments

@AlexAndorra
Copy link
Contributor

AlexAndorra commented Nov 24, 2018

As discussed with @AustinRochford on Twitter, pm.sample_prior_predictive seems to fail when working with multinomial likelihood : "TypeError: 'NoneType' object is not subscriptable"
I suspect it comes from a shape issue. Maybe it comes from my data, but it would be weird as sample_ppc is working (on a side note, the sample_posterior_predictive method seems to be missing from the current PyMC conda distribution)

Here is a minimal and reproducible example:

import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
import seaborn as sns
import warnings

RANDOM_SEED = 904
np.random.seed(90)

%matplotlib inline
sns.set()
warnings.simplefilter(action='ignore', category=FutureWarning)
print('Running on PyMC3 v{}'.format(pm.__version__))

mn_data = np.random.multinomial(n=100, pvals=[1/6.]*6, size=10)

with pm.Model() as dm_model:
    
    probs = pm.Dirichlet('probs', a=np.ones(6), shape=6)
    obs = pm.Multinomial('obs', n=mn_data.sum(axis=1), p=probs, observed=mn_data)
    
    burned_trace = pm.sample(1000, tune=500, cores=4, random_seed=RANDOM_SEED)

pm.traceplot(burned_trace); # no sampling problem

sim_priors = pm.sample_prior_predictive(samples=1000, model=dm_model, random_seed=RANDOM_SEED) # TypeError: 'NoneType' object is not subscriptable

pm.sample_posterior_predictive(burned_trace, samples=1000, model=dm_model, random_seed=RANDOM_SEED) # AttributeError: module 'pymc3' has no attribute 'sample_posterior_predictive'

Please provide the full traceback.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-06599b7f288c> in <module>()
----> 1 sim_priors = pm.sample_prior_predictive(samples=1000, model=dm_model, random_seed=RANDOM_SEED)

/anaconda/envs/cdf/lib/python3.6/site-packages/pymc3/sampling.py in sample_prior_predictive(samples, model, vars, random_seed)
   1314     names = get_default_varnames(model.named_vars, include_transformed=False)
   1315     # draw_values fails with auto-transformed variables. transform them later!
-> 1316     values = draw_values([model[name] for name in names], size=samples)
   1317 
   1318     data = {k: v for k, v in zip(names, values)}

/anaconda/envs/cdf/lib/python3.6/site-packages/pymc3/distributions/distribution.py in draw_values(params, point, size)
    319             else:
    320                 try:  # might evaluate in a bad order,
--> 321                     evaluated[param_idx] = _draw_value(param, point=point, givens=givens.values(), size=size)
    322                     if isinstance(param, collections.Hashable) and named_nodes_parents.get(param):
    323                         givens[param.name] = (param, evaluated[param_idx])

/anaconda/envs/cdf/lib/python3.6/site-packages/pymc3/distributions/distribution.py in _draw_value(param, point, givens, size)
    403                     val = dist_tmp.random(point=point, size=None)
    404                     dist_tmp.shape = val.shape
--> 405                 return dist_tmp.random(point=point, size=size)
    406             else:
    407                 return param.distribution.random(point=point, size=size)

/anaconda/envs/cdf/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in random(self, point, size)
    571         samples = generate_samples(self._random, n, p,
    572                                    dist_shape=self.shape,
--> 573                                    size=size)
    574         return samples
    575 

/anaconda/envs/cdf/lib/python3.6/site-packages/pymc3/distributions/distribution.py in generate_samples(generator, *args, **kwargs)
    512     elif broadcast_shape[:len(size_tup)] == size_tup:
    513         suffix = broadcast_shape[len(size_tup):] + dist_shape
--> 514         samples = [generator(*args, **kwargs).reshape(size_tup + (1,)) for _ in range(np.prod(suffix, dtype=int))]
    515         samples = np.hstack(samples).reshape(size_tup + suffix)
    516     else:

/anaconda/envs/cdf/lib/python3.6/site-packages/pymc3/distributions/distribution.py in <listcomp>(.0)
    512     elif broadcast_shape[:len(size_tup)] == size_tup:
    513         suffix = broadcast_shape[len(size_tup):] + dist_shape
--> 514         samples = [generator(*args, **kwargs).reshape(size_tup + (1,)) for _ in range(np.prod(suffix, dtype=int))]
    515         samples = np.hstack(samples).reshape(size_tup + suffix)
    516     else:

/anaconda/envs/cdf/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in _random(self, n, p, size)
    536         if size == p.shape:
    537             size = None
--> 538         elif size[-len(p.shape):] == p.shape:
    539             size = size[:len(size) - len(p.shape)]
    540 

TypeError: 'NoneType' object is not subscriptable

Please provide any additional information below.
My data and model are a lot more complex than the simple example here, but it does not seem to work in that simple case study.

Versions and main components

  • PyMC3 Version: v3.5
  • Theano Version: v1.0.3
  • Python Version: 3.6.7
  • Operating system: Mac OS X
  • How did you install PyMC3: conda
@junpenglao
Copy link
Member

junpenglao commented Nov 24, 2018

Definitively a problem here:

obs.distribution.random().shape
# ==> (10, 1, 6, 6)

@ColCarroll
Copy link
Member

You usually have to specify shapes for observed variables to sample from the prior (in this case, (10, 6)), but that does not fix it here.

I tried a few "easy" fixes and none worked. Once you specify the shape, draw_values does not forward a size to the generator, but even if it did, it doesn't help. There's also a few places you might try to sqeeze or reshape some arguments, but I could not find the right one.

On the plus side, I think any fix will be only touching the Multinomial code, and maybe the branch in draw_values that sends data there.

@AlexAndorra
Copy link
Contributor Author

Ooh I didn't know that you had to specify the shape for sample_prior_predictive, thank you @ColCarroll !

@junpenglao
Copy link
Member

Ooh I didn't know that you had to specify the shape for sample_prior_predictive, thank you @ColCarroll !

Actually, you dont need to as the shape is inferred from the observed: #3036

The problem here is the broadcasting of n and p, not easy to get it right...

@AlexAndorra
Copy link
Contributor Author

Ok, I think I got it. Hence @ColCarroll 's intuition that the problem (and solution) is contained to Multinomial

lucianopaz added a commit to lucianopaz/pymc that referenced this issue Dec 4, 2018
…er broadcasting of n and p. Added test based on pymc-devs#3271 problematic code.
twiecki added a commit that referenced this issue Dec 5, 2018
@junpenglao
Copy link
Member

Close by #3285

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants