Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RV design spec #266

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions docs/source/random_variable_design.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
Parameterize each ``RandomVariable`` in terms of other ``RandomVariable``\ s
============================================================================

If a ``RandomVariable`` ``Var`` needs to know some numerical parameter
values ``a``, ``b``, etc to perform its ``sample()`` call, those values
should be represented as constiuent ``RandomVariables`` that are
attributes of ``Var`` (i.e. ``Var.a_rv``, ``Var.b_rv``) and are are
``sample()``-d within ``Var.sample()`` (i.e. ``Var.sample()`` gets an
``a`` value by calling ``self.a_rv.sample()`` and a ``b`` value by
calling ``self.b_rv.sample()``).
Comment on lines +6 to +10
Copy link
Collaborator

@damonbayer damonbayer Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be too strong. Consider our generic observation processes, which have a mean argument. The mean is almost guaranteed to have been sampled by some other, more complex, RandomVariable that should not itself be wrapped by the observation process:

        # Sampling from the latent process
        (
            post_initialization_latent_infections,
            *_,
        ) = self.latent_infections_rv.sample(
            Rt=Rt,
            gen_int=gen_int,
            I0=I0,
            **kwargs,
        )

        observed_infections, *_ = self.infection_obs_process_rv.sample(
            mu=post_initialization_latent_infections[padding:],
            obs=data_observed_infections,
            **kwargs,
        )

This document suggests that post_initialization_latent_infections[padding:] should be made into a deterministic variable and then passed to the observation process. That seems a bit too much to me.

Copy link
Collaborator Author

@dylanhmorris dylanhmorris Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly, @damonbayer, you're saying it's important to allow RandomVariables to have a mix of:

  1. parameters defined at construction (which per the document must be RandomVariables, possibly deterministic)
  2. parameters passed directly to the sample() call, which may be concrete values, not RandomVariables

I am open to this approach, but I would suggest specifying that type (2) parameters (passed when sample() is called) should always be concrete values and thus represent fixed / "observed" / "data" from the RV's point-of-view. That aligns with the numpyro pattern of passing obs arguments to numpyro.sample calls.

Copy link
Member

@gvegayon gvegayon Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've encountered this too in #262. There's no silver bullet here. What if we need a TimeArray to be passed? I think the call to RV.sample() should be open to receive whatever (as it does now). Whatever could be a RandomVariable, a scalar, an array, a SampledArray (aka TimeArray).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, for instantiation, we probably, suggest using RandomVariable for anything that resembles a parameter, e.g., scale and loc in your example; but for things like, say, n (number of elements), we can avoid using RV.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could go stricter though, I do think it's worth thinking about the potential benefits and costs. Will add some thoughts to the document.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could go stricter though, I do think it's worth thinking about the potential benefits and costs. Will add some thoughts to the document.

I agree with Damon & George's earlier sentiments on this but also do think we could go stricter in certain instances, based on what we know about the RVs parameters and potentially also the complexity of the RV / where it lies in the modeling process. Some additional examples might be helpful showcasing the verbosity of the maximally random approach in situations where it feels like "too much"; if these MSR examples end up seeming "not too bad", then I might be more convinced to operate in a maximally random way with MSR.

Copy link
Collaborator

@damonbayer damonbayer Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In recent conversations, @dylanhmorris and I seem to have come to an agreement that RandomVariables should only have a single stochastic element, as a rule of thumb.


Often, a value ``a`` should to treated as fixed from the point of view
of ``Var``. Perhaps it is known with near-certainty. Perhaps it is
unknown, but it is sampled/ inferred elsewhere in a larger model. In
that case, use the ``DeterministicVariable()`` construct to pass it to
``Var``, i.e. ``Var.a_rv = DeterministicVariable(fixed_a_value)``. This
corresponds to the `“maximally random” approach <#maximally-random>`__
described in detail below.

Introduction to the problem
---------------------------

Sometimes, a ``RandomVariable`` can only be ``sample()``-d conditional
on one or more other parameter values. As a toy example, imagine writing
a ``RandomVariable`` class to draw from a Normal distribution
conditional on a specified location parameter ``loc`` and a scale
parameter ``scale``.

.. code:: python

import numpyro
import numpyro.distributions as dist
from pyrenew.metaclass import RandomVariable

class NormalRV(RandomVariable):

def __init__(self, name: str, loc: float, scale: float):
self.name = name
self.loc
self.scale

def sample(self, **kwargs):
return numpyro.sample(
self.name,
dist.Normal(loc=self.loc,
scale=self.scale),
**kwargs)


# instaniate and draw a sample
my_norm = NormalRV("my_normal_rv", loc=5, scale=0.25)

with numpyro.handlers.seed(rng_seed=5):
print(my_norm.sample())

Often, we would like these parameters the ``RandomVariable`` needs to
“know” to be themselves sampled / inferred / stochastic. How should we
set this up? It is useful to characterize two possible extremes: 1.
“Maximally fixed”: From an individual ``RandomVariable`` instance’s
point of view, all the parameters it needs to “know” in order to be
sampled from are fixed. 2. “Maximally random”: From an individual
``RandomVariable`` instance’s point of view, all the parameters it needs
to “know” in order to be sampled from are associated ``RandomVariable``
instances, which it explicitly ``sample()``-s when its own ``sample()``
method is called.

Let’s look at these two extremes in detail, with examples.

Maximally fixed
---------------

Any parameters needed for ``sample()``-ing from a ``RandomVariable`` are
passed as fixed to its constructor. If they are to be random / inferred,
they must be sampled “upstream” and the sampled values passed. Returning
to our toy example, one might do the following to have random ``loc``
and ``scale`` values for a ``NormalRV``:

.. code:: python

with numpyro.handlers.seed(rng_seed=5):
random_loc = numpyro.sample("loc", dist.Normal(0, 1))
random_scale = numpyro.sample("scale", dist.HalfNormal(2))
my_norm = NormalRV(loc=random_loc, scale=random_scale)
print(my_norm.sample())

Maximally random
----------------

Any parameters needed for ``sample()``-ing from a ``RandomVariable``
``Var`` are expressed as constiuent ``RandomVariable``\ s of their own
(e.g. ``SubVar1``, ``SubVar2``, etc). When we call ``Var.sample()``, the
constituent random variables are ``sample()``-d “under the hood”
(i.e. ``SubVar1.sample()``, ``SubVar2.sample()``, etc get called within
``Var.sample()``). In this framework, we can express a ``NormalRV`` with
random ``loc`` and ``scale`` values as:

.. code:: python

from pyrenew.metaclass import DistributionalRV
class NormalRV():

def __init__(self, name: str,
loc_rv: RandomVariable,
scale_rv: RandomVariable):
self.name = name
self.loc_rv = loc_rv
self.scale_rv = scale_rv

def sample(self, **kwargs):
loc_sampled_value = self.loc_rv.sample()
scale_sampled_value = self.scale_rv.sample()
return numpyro.sample(
self.name,
dist.Normal(loc=loc_sampled_value,
scale=scale_sampled_value),
**kwargs)

loc_rv = DistributionalRV(dist.Normal(0, 1), "loc_dist")
scale_rv = DistributionalRV(dist.HalfNormal(2), "scale_dist")
my_norm = NormalRV("my_normal_rv", loc_rv=loc_rv, scale_rv=scale_rv)

with numpyro.handlers.seed(rng_seed=5):
print(my_norm.sample())

Why we prefer “Maximally random”
--------------------------------

We believe this approach gives additional flexibility in model building
at minimal cost in terms of additional verbiage/abstraction. Using this
framework, all of the following are valid ``Pyrenew``:

Two ``NormalRV``\ s with distinct inferred scales, with distinct priors,
and distinct inferred ``loc`` values, with distinct priors:

.. code:: python

scale_rv1 = DistributionalRV(dist.HalfNormal(2), "scale_dist1")
scale_rv2 = DistributionalRV(dist.HalfNormal(0.5), "scale_dist2")
loc_rv1 = DistributionalRV(dist.Normal(0, 1), "loc_dist1")
loc_rv2 = DistributionalRV(dist.Normal(1, 1), "loc_dist2")
norm1 = NormalRV(loc_rv=loc_rv1, scale_rv=scale_rv1)
norm2 = NormalRV(loc_rv=loc_rv2, scale_rv=scale_rv2)

Two ``NormalRV``\ s with distinct inferred scales, with a shared prior,
and distinct inferred ``loc``\ s, with distinct priors:

.. code:: python

scale_rv = DistributionalRV(dist.HalfNormal(0, 2), "scale_dist")
loc_rv1 = DistributionalRV(dist.Normal(0, 1), "loc_dist1")
loc_rv2 = DistributionalRV(dist.Normal(1, 1), "loc_dist2")
norm1 = NormalRV(loc_rv=loc_rv1, scale_rv=scale_rv)
norm2 = NormalRV(loc_rv=loc_rv2, scale_rv=scale_rv)

Two ``NormalRV``\ s with a single shared inferred ``scale`` and distinct
inferred ``loc``\ s, with distinct priors:

.. code:: python

scale_rv = DistributionalRV(dist.HalfNormal(2), "scale_dist")
loc_rv1 = DistributionalRV(dist.Normal(0, 1), "loc1_dist")
loc_rv2 = DistributionalRV(dist.Normal(1, 1), "loc2_dist")

# we sample the scale value here explicitly,
# and the pass it to each of the NormalRVs as a
# DeterministicVariable
scale_val = scale_rv.sample()

shared_scale = DeterministicVariable(scale_val)

norm1 = NormalRV(loc_rv=loc_rv1, scale_rv=shared_scale)
norm2 = NormalRV(loc_rv=loc_rv2, scale_rv=shared_scale)

Future possibilities
====================

In the future, we might wish to have ``RandomVariable`` constructors
coerce fixed values to ``DeterministicVariables`` via some
``ensure_rv()`` function, so that this:

.. code:: python

scale_rv = DistributionalRV(dist.HalfNormal(2), "scale_dist")
loc_rv1 = DistributionalRV(dist.Normal(0, 1), "loc1_dist")
loc_rv2 = DistributionalRV(dist.Normal(1, 1), "loc2_dist")
shared_scale = scale_rv.sample()

norm1 = NormalRV(loc_rv=loc_rv1, scale_rv=shared_scale)
norm2 = NormalRV(loc_rv=loc_rv1, scale_rv=shared_scale)

is viable shorthand for this:

.. code:: python

scale_rv = DistributionalRV(dist.HalfNormal(2), "scale_dist")
loc_rv1 = DistributionalRV(dist.Normal(0, 1), "loc1_dist")
loc_rv2 = DistributionalRV(dist.Normal(1, 1), "loc2_dist")

scale_val = scale_rv.sample()
shared_scale = DeterministicVariable(scale_val)

norm1 = NormalRV(loc_rv=loc_rv1, scale_rv=shared_scale)
norm2 = NormalRV(loc_rv=loc_rv2, scale_rv=shared_scale)