Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor funsor.distributions without changing API #320

Merged
merged 44 commits into from
Mar 25, 2020

Conversation

eb8680
Copy link
Member

@eb8680 eb8680 commented Mar 3, 2020

Resolves #159

This PR refactors the funsor.distributions module so that PyTorch distributions are wrapped programmatically and to_funsor and to_data can be defined generically. It is similar to #319 except that it does not change the funsor.distributions API and does not contain new to_funsor/to_data logic.

Necessary to support the broader refactoring around funsor integration and backend independence in pyro-ppl/pyro#2307. In particular, this PR is a prerequisite for Funsor support in NumPyro.

funsor/distributions.py Outdated Show resolved Hide resolved
funsor/distributions.py Outdated Show resolved Hide resolved
funsor/distributions.py Outdated Show resolved Hide resolved
funsor/distributions.py Outdated Show resolved Hide resolved
funsor/distributions.py Outdated Show resolved Hide resolved
with interpretation(lazy):
return super(DistributionMeta, cls).__call__(*args)
value = kwargs.pop('value', 'value')
kwargs = OrderedDict((k, to_funsor(v)) for k, v in kwargs.items())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I miss a major change in Python 3.6 that makes kwargs deterministically ordered? Can you point me to any reading material about this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insertion order in dictionaries was made deterministic in Python 3.6 and this was added to the language spec in 3.7 ("Python data model improvements"). I guess that's why this code isn't breaking.

Copy link
Member

@fritzo fritzo Mar 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointers!

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon

I'd feel safer if we avoided reliance on deterministic order for now (there being so many other potential pitfalls all around Funsor). How much extra work would it be to avoid relying on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be much, I'll make all the suggested determinism-related changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh look at this, PEP 468 was accepted in Python 3.6. So I guess we're safe 😄

**kwargs in a function signature is now guaranteed to be an insertion-order-preserving mapping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eb8680 maybe I'm overreacting about the effort required to resume support for Python 3.5. This comment thread illustrates my main concern, that we would need to pay special attention to order in kwargs and other use of dictionaries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.5 compatibility in Funsor has always been nominal at best. I am happy to reopen and continue working on #329 but that only makes sense if we can agree beforehand on specific, objective criteria that would give us enough confidence to go ahead with adding Funsor as an optional dependency to Pyro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nominal at best

Well early on we support Python 2.7 and we took care to ensure determinism. I think this PR on March 25 was the first where I stopped reviewing for dict order determinism.

if we can agree beforehand on specific, objective criteria

Sounds reasonable. I think there are two concerns: ensuring determinism and ensuring correctness. I'm willing to give up determinism (at least across python processes; see PYTHONHASHSEED in Python 3.3+). But we'll need to avoid dangerous nondeterminism, e.g.

# This is correct only in Python 3.6+:
Tensor(torch.randn(2, 3), OrderedDict(a=bint(2), b=bint(3)))

Since this is difficult to test, I can only offer subjective criteria. To make this as objective as possible, I suggest we split your 3.5 PR #329 into two parts:

  1. Make Funsor compatible with Python 3.5 #329 objective: get existing tests to pass; followed by
  2. #xxx subjective: we meet for 1 hour to decide subjective criteria in the form of say regular expressions we can run agains source code (e.g. OrderedDict(\w+(\w\d)?=). We code those up in a test_source_code.py and make a second PR that changes only enough code to satisfy those properties.

wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are good suggestions, but I would prefer to avoid any further speculative coding. To that end I suggest we reverse the order of operations and start a PR with a failing test_source_code.py and a list of any other show-stopping compatibility problems, ideally expressed as failing unit tests.

If we can agree that fixing the problems in the PR is both necessary and sufficient to add Funsor as an optional dependency to Pyro, I can make the actual fixes including getting existing tests to pass in #329. If we cannot agree, or the list appears to require too much engineering effort, then there's no reason to proceed with #329 or any related work.

If this plan is acceptable, I will put up a PR in the next few days with a seed version of test_source_code.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a reasonable plan. I'm happy to chat about test_source_code.py or just see what you come up with 🙂

It might help us discover dangerous patterns to simply run the tests under Python 3.5.

test/test_distributions.py Outdated Show resolved Hide resolved
Copy link
Member

@fritzo fritzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally nice simplification, and clever probing logic!

It would be nice if we could execute the probing logic on library load e.g. via a list of tensor inputs for each distribution.

class BernoulliLogits(Distribution):
"""
Wraps :class:`pyro.distributions.Bernoulli` .
class __BernoulliProbs(dist.Bernoulli):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you're using double underscore here? I try to avoid double underscore because it makes debugging difficult due to Python's name mangling (double underscores are generally avoided for that reason).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to avoid confusion in the wrapping logic with any custom distribution types (like Pyro's _Subsample) that have a _ prefix for some reason, but I'll change it to something other than double underscore.

]

for pyro_dist_class, param_names in _wrapped_pyro_dists:
locals()[pyro_dist_class.__name__.split("__")[-1].split(".")[-1]] = make_dist(pyro_dist_class, param_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, could you use a single underscore and .lstrip("_")?

Delta._infer_value_domain = classmethod(lambda cls, **kwargs: kwargs['v'])


# Multinomial and related dists have dependent bint dtypes, so we just make them 'real'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, would you mind pointing to this new issue so we can start planning refactoring?
#322

@@ -63,7 +66,7 @@ class Distribution(Funsor, metaclass=DistributionMeta):
funsors or objects that can be coerced to funsors via
:func:`~funsor.terms.to_funsor` . See derived classes for details.
"""
dist_class = "defined by derived classes"
dist_class = dist.Distribution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this value actually used, or are you merely tidying up to be well-typed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tidying up to be well-typed.

Copy link
Member

@fritzo fritzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after minor nits. Thanks for your patience.

@fritzo fritzo merged commit 69ea45a into master Mar 25, 2020
@eb8680 eb8680 deleted the distributions-refactor branch May 17, 2020 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate Distribution and RandomVariable wrapper classes programmatically
3 participants