Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Adding a Deterministic results in the error: All variables needed to compute inner-graph must be provided as inputs under strict=True #7312

Closed
tomicapretto opened this issue May 13, 2024 · 8 comments · Fixed by #7315 or #7328
Assignees
Labels

Comments

@tomicapretto
Copy link
Contributor

tomicapretto commented May 13, 2024

Describe the issue:

The same model results in an error depending on the usage of a deterministic. See the examples below.

Reproduceable code example:

import numpy as np
import pymc as pm
import pytensor.tensor as pt

rng = np.random.default_rng(1234)
n1, n2 = 30, 70
y = np.concatenate([np.zeros(n1), rng.poisson(3, size=n2)]).astype(int)

coords={"__obs__": np.arange(n1 + n2)}


# Works
with pm.Model(coords=coords) as model:
    a = pm.Normal("Intercept", mu=0, sigma=2.5)
    psi = pm.Beta("psi", alpha=2, beta=2)
    pm.HurdlePoisson("y", mu=pt.exp(a), psi=psi, observed=y, dims="__obs__")

# Raises error
with pm.Model(coords=coords) as model:
    a = pm.Normal("Intercept", mu=0, sigma=2.5)
    mu = pm.Deterministic("mu", pt.exp(a))
    psi = pm.Beta("psi", alpha=2, beta=2)
    pm.HurdlePoisson("y", mu=mu, psi=psi, observed=y, dims="__obs__")

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 5
      3 mu = pm.Deterministic("mu", pt.exp(a))
      4 psi = pm.Beta("psi", alpha=2, beta=2)
----> 5 pm.HurdlePoisson("y", mu=mu, psi=psi, observed=y, dims="__obs__")

File ~/anaconda3/envs/bambi-dev/lib/python3.11/site-packages/pymc/distributions/mixture.py:866, in HurdlePoisson.__new__(cls, name, psi, mu, **kwargs)
    865 def __new__(cls, name, psi, mu, **kwargs):
--> 866     return _hurdle_mixture(
    867         name=name, nonzero_p=psi, nonzero_dist=Poisson.dist(mu=mu), dtype="int", **kwargs
    868     )

File ~/anaconda3/envs/bambi-dev/lib/python3.11/site-packages/pymc/distributions/mixture.py:822, in _hurdle_mixture(name, nonzero_p, nonzero_dist, dtype, **kwargs)
    818 nonzero_p = pt.as_tensor_variable(nonzero_p)
    819 weights = pt.stack([1 - nonzero_p, nonzero_p], axis=-1)
    820 comp_dists = [
    821     DiracDelta.dist(zero),
--> 822     Truncated.dist(nonzero_dist, lower=lower),
    823 ]
    825 if name is not None:
    826     return Mixture(name, weights, comp_dists, **kwargs)

File ~/anaconda3/envs/bambi-dev/lib/python3.11/site-packages/pymc/distributions/truncated.py:316, in Truncated.dist(cls, dist, lower, upper, max_n_steps, **kwargs)
    313 if lower is None and upper is None:
    314     raise ValueError("lower and upper cannot both be None")
--> 316 return super().dist([dist, lower, upper, max_n_steps], **kwargs)

File ~/anaconda3/envs/bambi-dev/lib/python3.11/site-packages/pymc/distributions/distribution.py:633, in Distribution.dist(cls, dist_params, shape, **kwargs)
    631 ndim_supp = getattr(cls.rv_type, "ndim_supp", None)
    632 if ndim_supp is None:
--> 633     ndim_supp = cls.rv_op(*dist_params, **kwargs).owner.op.ndim_supp
    634 create_size = find_size(shape=shape, size=size, ndim_supp=ndim_supp)
    635 rv_out = cls.rv_op(*dist_params, size=create_size, **kwargs)

File ~/anaconda3/envs/bambi-dev/lib/python3.11/site-packages/pymc/distributions/truncated.py:183, in TruncatedRV.rv_op(cls, dist, lower, upper, max_n_steps, size)
    179     return graph_inputs.index(rng)
    181 next_rngs = [next_rng for rng, next_rng in sorted(updates.items(), key=sort_updates)]
--> 183 return TruncatedRV(
    184     base_rv_op=dist.owner.op,
    185     inputs=graph_inputs,
    186     outputs=[truncated_rv, *next_rngs],
    187     ndim_supp=0,
    188     max_n_steps=max_n_steps,
    189 )(*graph_inputs)

File ~/anaconda3/envs/bambi-dev/lib/python3.11/site-packages/pymc/distributions/truncated.py:77, in TruncatedRV.__init__(self, base_rv_op, max_n_steps, *args, **kwargs)
     72 self.max_n_steps = max_n_steps
     73 self._print_name = (
     74     f"Truncated{self.base_rv_op._print_name[0]}",
     75     f"\\operatorname{{{self.base_rv_op._print_name[1]}}}",
     76 )
---> 77 super().__init__(*args, **kwargs)

File ~/anaconda3/envs/bambi-dev/lib/python3.11/site-packages/pymc/distributions/distribution.py:422, in SymbolicRandomVariable.__init__(self, *args, **kwargs)
    420 kwargs.setdefault("inline", True)
    421 kwargs.setdefault("strict", True)
--> 422 super().__init__(*args, **kwargs)

File ~/anaconda3/envs/bambi-dev/lib/python3.11/site-packages/pytensor/compile/builders.py:423, in OpFromGraph.__init__(self, inputs, outputs, inline, lop_overrides, grad_overrides, rop_overrides, connection_pattern, strict, name, **kwargs)
    418 self.fgraph, self.shared_inputs, _, _ = construct_nominal_fgraph(
    419     inputs, outputs
    420 )
    422 if strict and self.shared_inputs:
--> 423     raise ValueError(
    424         "All variables needed to compute inner-graph must be provided as inputs under strict=True. "
    425         f"The inner-graph implicitly depends on the following shared variables {self.shared_inputs}"
    426     )
    428 self.kwargs = kwargs
    429 self.input_types = [inp.type for inp in inputs]

ValueError: All variables needed to compute inner-graph must be provided as inputs under strict=True. The inner-graph implicitly depends on the following shared variables [RandomGeneratorSharedVariable(<Generator(PCG64) at 0x71D9FE9BE420>)]
ValueError: All variables needed to compute inner-graph must be provided as inputs under strict=True. The inner-graph implicitly depends on the following shared variables [RandomGeneratorSharedVariable(<Generator(PCG64) at 0x71D9FE9BE420>)]

PyMC version information:

PyMC = 5.14.0
PyTensor = 2.20.0

Context for the issue:

I'm trying to finalize a large refactor in Bambi and some tests about HurdlePoisson didn't pass and that's how I found this.
The same thing happened with HurdleNegativeBinomial.

@tomicapretto
Copy link
Contributor Author

I found a similar error message in a pretty big model I'm working with, and I think it's related to this same issue. I see the same error when I try to sample from the posterior predictive when I use pm.Truncated together with pm.CustomDist. See the following example:

import numpy as np
import pymc as pm
import pytensor.tensor as pt

from pymc.model.fgraph import clone_model

# simulate data
y_values = pm.draw(pm.Truncated.dist(pm.Exponential.dist(scale=[2, 4, 6]), upper=7), 200, random_seed=1234)
y_values = y_values.T.flatten()

groups = list("ABC")
groups_idx = np.repeat([0, 1, 2], 200)

assert len(y_values) == len(groups_idx)

coords = {
    "group": groups,
    "__obs__": np.arange(len(y_values))
}

# Works
with pm.Model(coords=coords) as model:
    groups_idx_data = pm.Data("groups_idx", groups_idx, dims="__obs__")
    
    b = pm.Normal("b", dims="group")
    scale = pm.Deterministic("scale", b[groups_idx_data], dims="__obs__")
    value_latent = pm.Exponential.dist(scale=scale)
    value = pm.Truncated("value", value_latent, upper=7, observed=y_values, dims="__obs__")

    idata = pm.sample(chains=2, random_seed=1234)


# Works
new_coords = {
    "__obs__": np.arange(3) + 100,
}

new_data = {
    "groups_idx": np.array([0, 1, 2])
}

with clone_model(model) as c_model:
    pm.set_data(new_data, coords=new_coords)
    predictions = pm.sample_posterior_predictive(
        idata, 
        var_names=["value"], 
        predictions=True,
        random_seed=1234,
    )

# Fails
new_coords = {
    "__obs__": np.arange(3) + 100,
}

new_data = {
    "groups_idx": np.array([0, 1, 2])
}

def f_exp(scale, size):
    return pm.Exponential.dist(scale=scale, size=size)

with clone_model(model) as c_model:
    pm.set_data(new_data, coords=new_coords)
    b = c_model["b"]
    groups_idx_data = c_model["groups_idx"]
    scale = pm.Deterministic("b_new", b[groups_idx_data], dims="__obs__")
    value_latent = pm.CustomDist.dist(scale, dist=f_exp)
    pm.Truncated("value_new", value_latent, upper=7, dims="__obs__")
    predictions = pm.sample_posterior_predictive(
        idata, 
        var_names=["value_new"], 
        predictions=True,
        random_seed=1234,
    )
ValueError: All variables needed to compute inner-graph must be provided as inputs under strict=True. The inner-graph implicitly depends on the following shared variables [RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7415037912A0>), group, groups_idx]

@ricardoV94
Copy link
Member

Is the problem gone with the bugfix?

@tomicapretto
Copy link
Contributor Author

Nope, I installed from main and I still had the problem.

@ricardoV94 ricardoV94 reopened this May 21, 2024
@ricardoV94
Copy link
Member

ricardoV94 commented May 21, 2024

Is the clone_model stuff needed to reproduce the problem?

@tomicapretto
Copy link
Contributor Author

Nope, the following also fails with the same message

# Fails
new_coords = {
    "__obs__": np.arange(3) + 100,
}

new_data = {
    "groups_idx": np.array([0, 1, 2])
}

def f_exp(scale, size):
    return pm.Exponential.dist(scale=scale, size=size)

with model:
    pm.set_data(new_data, coords=new_coords)
    b = model["b"]
    groups_idx_data = model["groups_idx"]
    scale = pm.Deterministic("b_new", b[groups_idx_data], dims="__obs__")
    value_latent = pm.CustomDist.dist(scale, dist=f_exp)
    pm.Truncated("value_new", value_latent, upper=7, dims="__obs__")
    predictions = pm.sample_posterior_predictive(
        idata, 
        var_names=["value_new"], 
        predictions=True,
        random_seed=1234,
    )

@ricardoV94
Copy link
Member

If you can get an even smaller example without the set_data / multiple models that's even better :)

@tomicapretto
Copy link
Contributor Author

@ricardoV94 I'll try :)

@ricardoV94
Copy link
Member

ricardoV94 commented May 22, 2024

Here is a MWE:

import pymc as pm

def f_exp(scale, size):
    return pm.Exponential.dist(scale=scale, size=size)

with pm.Model() as model:
    b = pm.Normal("b", shape=(3,))
    value_latent = pm.CustomDist.dist(b[[0, 0, 1, 1, 2, 2]], dist=f_exp)
    pm.Truncated("value_new", value_latent, upper=7) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants