Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RV design spec #266

Closed
wants to merge 4 commits into from
Closed

Add RV design spec #266

wants to merge 4 commits into from

Conversation

dylanhmorris
Copy link
Collaborator

Draft RV design spec added. Draft PR since I haven't configured it to be part of the official docs, etc, as I want comments first.

@dylanhmorris dylanhmorris requested review from gvegayon, damonbayer and a team July 15, 2024 20:06
Copy link

codecov bot commented Jul 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.73%. Comparing base (0219353) to head (7a63db3).
Report is 59 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #266   +/-   ##
=======================================
  Coverage   92.73%   92.73%           
=======================================
  Files          40       40           
  Lines         909      909           
=======================================
  Hits          843      843           
  Misses         66       66           
Flag Coverage Δ
unittests 92.73% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +6 to +10
should be represented as constiuent ``RandomVariables`` that are
attributes of ``Var`` (i.e. ``Var.a_rv``, ``Var.b_rv``) and are are
``sample()``-d within ``Var.sample()`` (i.e. ``Var.sample()`` gets an
``a`` value by calling ``self.a_rv.sample()`` and a ``b`` value by
calling ``self.b_rv.sample()``).
Copy link
Collaborator

@damonbayer damonbayer Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be too strong. Consider our generic observation processes, which have a mean argument. The mean is almost guaranteed to have been sampled by some other, more complex, RandomVariable that should not itself be wrapped by the observation process:

        # Sampling from the latent process
        (
            post_initialization_latent_infections,
            *_,
        ) = self.latent_infections_rv.sample(
            Rt=Rt,
            gen_int=gen_int,
            I0=I0,
            **kwargs,
        )

        observed_infections, *_ = self.infection_obs_process_rv.sample(
            mu=post_initialization_latent_infections[padding:],
            obs=data_observed_infections,
            **kwargs,
        )

This document suggests that post_initialization_latent_infections[padding:] should be made into a deterministic variable and then passed to the observation process. That seems a bit too much to me.

Copy link
Collaborator Author

@dylanhmorris dylanhmorris Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly, @damonbayer, you're saying it's important to allow RandomVariables to have a mix of:

  1. parameters defined at construction (which per the document must be RandomVariables, possibly deterministic)
  2. parameters passed directly to the sample() call, which may be concrete values, not RandomVariables

I am open to this approach, but I would suggest specifying that type (2) parameters (passed when sample() is called) should always be concrete values and thus represent fixed / "observed" / "data" from the RV's point-of-view. That aligns with the numpyro pattern of passing obs arguments to numpyro.sample calls.

Copy link
Member

@gvegayon gvegayon Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've encountered this too in #262. There's no silver bullet here. What if we need a TimeArray to be passed? I think the call to RV.sample() should be open to receive whatever (as it does now). Whatever could be a RandomVariable, a scalar, an array, a SampledArray (aka TimeArray).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, for instantiation, we probably, suggest using RandomVariable for anything that resembles a parameter, e.g., scale and loc in your example; but for things like, say, n (number of elements), we can avoid using RV.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could go stricter though, I do think it's worth thinking about the potential benefits and costs. Will add some thoughts to the document.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could go stricter though, I do think it's worth thinking about the potential benefits and costs. Will add some thoughts to the document.

I agree with Damon & George's earlier sentiments on this but also do think we could go stricter in certain instances, based on what we know about the RVs parameters and potentially also the complexity of the RV / where it lies in the modeling process. Some additional examples might be helpful showcasing the verbosity of the maximally random approach in situations where it feels like "too much"; if these MSR examples end up seeming "not too bad", then I might be more convinced to operate in a maximally random way with MSR.

Copy link
Collaborator

@damonbayer damonbayer Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In recent conversations, @dylanhmorris and I seem to have come to an agreement that RandomVariables should only have a single stochastic element, as a rule of thumb.

@gvegayon
Copy link
Member

I think is looking great, @dylanhmorris. I think this document will be really useful for future users. As with the timearray stuff, I would be OK to merge this as it is: this is our current understanding and agreement of how parametrization should occur, but it can change in the future.

@gvegayon
Copy link
Member

Do you need anything from me, @dylanhmorris?

@gvegayon gvegayon mentioned this pull request Jul 25, 2024
@damonbayer
Copy link
Collaborator

I don't think the existing writeup is very reflective of our current thinking. A revised write up should be based on the recent developments of dynamic and static distributional random variables (#391) @dylanhmorris Do you want to continue working in this branch or should we close this PR?

@dylanhmorris
Copy link
Collaborator Author

I think it can stay on this branch but I agree it needs a new issue.

@dylanhmorris
Copy link
Collaborator Author

Closing pending further design spec decisions to be made downstream of #407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants