Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior with pm.sample(var_names=...) #7258

Closed
tomicapretto opened this issue Apr 15, 2024 · 8 comments · Fixed by #7284 or #7287
Closed

Unexpected behavior with pm.sample(var_names=...) #7258

tomicapretto opened this issue Apr 15, 2024 · 8 comments · Fixed by #7284 or #7287

Comments

@tomicapretto
Copy link
Contributor

tomicapretto commented Apr 15, 2024

Description

PyMC 5.13 incorporates the var_names parameter in pm.sample(). The documentation says var_names: Names of variables to be stored in the trace. Defaults to all free variables and deterministics.

This comes very handy for something I've been trying to do in Bambi. Now I'm porting Bambi to use this feature and noticed weird results with tests. I reproduced one of the models with PyMC and noticed the problem. Have a look at this

import arviz as az
import numpy as np
import pymc as pm

batch = np.array(
    [
        1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  4,  4,  4,  4,  5,  5,  5,
        6,  6,  6,  7,  7,  7,  7,  8,  8,  8,  9,  9, 10, 10, 10
    ]
)
temp = np.array(
    [
        205, 275, 345, 407, 218, 273, 347, 212, 272, 340, 235, 300, 365,
        410, 307, 367, 395, 267, 360, 402, 235, 275, 358, 416, 285, 365,
        444, 351, 424, 365, 379, 428
    ]
)

y = np.array(
    [
        0.122, 0.223, 0.347, 0.457, 0.08 , 0.131, 0.266, 0.074, 0.182,
        0.304, 0.069, 0.152, 0.26 , 0.336, 0.144, 0.268, 0.349, 0.1  ,
        0.248, 0.317, 0.028, 0.064, 0.161, 0.278, 0.05 , 0.176, 0.321,
        0.14 , 0.232, 0.085, 0.147, 0.18
    ]
)

batch_values, batch_idx  = np.unique(batch, return_inverse=True)

coords = {
    "batch": batch_values
}

with pm.Model(coords=coords) as model:
    b_batch = pm.Normal("b_batch", dims="batch")
    b_temp = pm.Normal("b_temp")
    mu = pm.Deterministic("mu", pm.math.invlogit(b_batch[batch_idx] + b_temp * temp))
    kappa = pm.Gamma("kappa", alpha=2, beta=2)
    
    alpha = mu * kappa
    beta = (1 - mu) * kappa
    
    pm.Beta("y", alpha=alpha, beta=beta, observed=y)

I want to sample the posterior, but I don't want to store the draws of "mu" by default. So I use var_names=["b_batch", "b_temp", "kappa"] (and I also sample without var_names to see the difference).

with model:
    idata_1 = pm.sample(random_seed=1234)
    idata_2 = pm.sample(var_names=["b_batch", "b_temp", "kappa"], random_seed=1234)

When I don't use var_names I get the following posterior

az.plot_trace(idata_1, backend_kwargs={"layout": "constrained"});

image

and when I use var_names it's the following

az.plot_trace(idata_2, backend_kwargs={"layout": "constrained"});

image

which makes me think it's basically omitting the likelihood and thus sampling from the prior.

Is this behavior expected?

@ricardoV94
Copy link
Member

Is this behavior expected?

Nope. I guess we didn't incorporate it well. CC @fonnesbeck

@tomicapretto
Copy link
Contributor Author

@ricardoV94 @fonnesbeck I had a look at the code but I'm not very familiar with the trace and sampler internals. Do you have any pointer on what needs to be modified? I could give it a shot

@tomicapretto
Copy link
Contributor Author

I've been trying to better understand what's happening here and I reached the initiliaziation method of the BaseTrace class

pymc/pymc/backends/base.py

Lines 145 to 158 in 60a6314

def __init__(self, name, model=None, vars=None, test_point=None):
self.name = name
model = modelcontext(model)
self.model = model
if vars is None:
vars = model.unobserved_value_vars
unnamed_vars = {var for var in vars if var.name is None}
if unnamed_vars:
raise Exception(f"Can't trace unnamed variables: {unnamed_vars}")
self.vars = vars
self.varnames = [var.name for var in vars]
self.fn = model.compile_fn(vars, inputs=model.value_vars, on_unused_input="ignore")

So I imagined something was happening with the compiled function. I manually re-created the compiled function with var_names=None and with var_names=["b_batch", "b_temp", "kappa"].

var_names = ["b_batch", "b_temp", "kappa"]
vars_trace = [v for v in model.unobserved_RVs if v.name in var_names]
point_func = model.compile_fn(vars_trace, inputs=model.value_vars, on_unused_input="ignore") 
point_func({"b_batch": np.array([0] * 10), "b_temp": 0, "kappa_log__":0})
[array([-2.04367572,  0.20578688,  0.44895   , -0.25317641,  1.70734415,
        -0.25723554, -0.58851374, -1.53500779,  0.73048534,  0.90274171]),
 array(-0.04592494),
 array(0.68558234)]

If I don't use on_unused_input="ignore" it raises

UnusedInputError: pytensor.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 0 is not part of the computational graph needed to compute the outputs: b_batch.

which I think it indicates the problem. Also, notice every time you ran the previous chunk it'll return different values.

On the other hand, if we reproduce what var_names=None does we get

vars_trace = model.unobserved_value_vars
var_names = [var.name for var in vars_trace] # ['b_batch', 'b_temp', 'kappa_log__', 'kappa', 'mu']
point_func = model.compile_fn(vars_trace, inputs=model.value_vars)
point_func({"b_batch": np.array([0] * 10), "b_temp": 0, "kappa_log__":0})
[array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 array(0.),
 array(0.),
 array(1.),
 array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
        0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
        0.5, 0.5, 0.5, 0.5, 0.5, 0.5])]

To summarize, I'm thinking not including variables in the trace is a bit more complicated than expected? I'm not sure what is the next step is. To me it looks like we need to pass all the necessary outputs to compile_fn so it works properly, but then somehow store only part of the results, the ones that match whatever we pass to var_names. But I don't know, this is new territory for me.

@ricardoV94
Copy link
Member

ricardoV94 commented Apr 27, 2024

My hunch is that with var_names it's trying to create a function of the RVs (which explains why it's random and why it does not depend on the model.value_vars), but we want one of the respective value variables. We basically want to use var_names to filter out a subset of the value variables (which at some point must be collected when you set None)?

@ricardoV94
Copy link
Member

ricardoV94 commented Apr 27, 2024

If you check what model.unobserved_value_vars does, you'll see it grabs all RVs and deterministics, and converts them to the "respective" value variables, using something like model.replace_rvs_by_values or something similar sounding.

We want to do the same when var_names is given

@ricardoV94
Copy link
Member

Reopening as a robust test is still missing

@tomicapretto
Copy link
Contributor Author

Would a comparison of draws between two posteriors one with and without var_names be enough? Or do you have something more elaborate in mind?

@ricardoV94
Copy link
Member

Would a comparison of draws between two posteriors one with and without var_names be enough? Or do you have something more elaborate in mind?

That's what I had in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants