Refactor funsor.distributions without changing API #320

eb8680 · 2020-03-03T22:37:30Z

Resolves #159

This PR refactors the funsor.distributions module so that PyTorch distributions are wrapped programmatically and to_funsor and to_data can be defined generically. It is similar to #319 except that it does not change the funsor.distributions API and does not contain new to_funsor/to_data logic.

Necessary to support the broader refactoring around funsor integration and backend independence in pyro-ppl/pyro#2307. In particular, this PR is a prerequisite for Funsor support in NumPyro.

funsor/distributions.py

fritzo · 2020-03-05T01:00:21Z

funsor/distributions.py

-        with interpretation(lazy):
-            return super(DistributionMeta, cls).__call__(*args)
+        value = kwargs.pop('value', 'value')
+        kwargs = OrderedDict((k, to_funsor(v)) for k, v in kwargs.items())


Did I miss a major change in Python 3.6 that makes kwargs deterministically ordered? Can you point me to any reading material about this change?

Insertion order in dictionaries was made deterministic in Python 3.6 and this was added to the language spec in 3.7 ("Python data model improvements"). I guess that's why this code isn't breaking.

Thanks for the pointers!

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon

I'd feel safer if we avoided reliance on deterministic order for now (there being so many other potential pitfalls all around Funsor). How much extra work would it be to avoid relying on this?

Shouldn't be much, I'll make all the suggested determinism-related changes

Oh look at this, PEP 468 was accepted in Python 3.6. So I guess we're safe 😄

**kwargs in a function signature is now guaranteed to be an insertion-order-preserving mapping.

@eb8680 maybe I'm overreacting about the effort required to resume support for Python 3.5. This comment thread illustrates my main concern, that we would need to pay special attention to order in kwargs and other use of dictionaries.

Python 3.5 compatibility in Funsor has always been nominal at best. I am happy to reopen and continue working on #329 but that only makes sense if we can agree beforehand on specific, objective criteria that would give us enough confidence to go ahead with adding Funsor as an optional dependency to Pyro.

nominal at best

Well early on we support Python 2.7 and we took care to ensure determinism. I think this PR on March 25 was the first where I stopped reviewing for dict order determinism.

if we can agree beforehand on specific, objective criteria

Sounds reasonable. I think there are two concerns: ensuring determinism and ensuring correctness. I'm willing to give up determinism (at least across python processes; see PYTHONHASHSEED in Python 3.3+). But we'll need to avoid dangerous nondeterminism, e.g.

# This is correct only in Python 3.6+: Tensor(torch.randn(2, 3), OrderedDict(a=bint(2), b=bint(3)))

Since this is difficult to test, I can only offer subjective criteria. To make this as objective as possible, I suggest we split your 3.5 PR #329 into two parts:

Make Funsor compatible with Python 3.5 #329 objective: get existing tests to pass; followed by

#xxx subjective: we meet for 1 hour to decide subjective criteria in the form of say regular expressions we can run agains source code (e.g. OrderedDict(\w+(\w\d)?=). We code those up in a test_source_code.py and make a second PR that changes only enough code to satisfy those properties.

wdyt?

Those are good suggestions, but I would prefer to avoid any further speculative coding. To that end I suggest we reverse the order of operations and start a PR with a failing test_source_code.py and a list of any other show-stopping compatibility problems, ideally expressed as failing unit tests.

If we can agree that fixing the problems in the PR is both necessary and sufficient to add Funsor as an optional dependency to Pyro, I can make the actual fixes including getting existing tests to pass in #329. If we cannot agree, or the list appears to require too much engineering effort, then there's no reason to proceed with #329 or any related work.

If this plan is acceptable, I will put up a PR in the next few days with a seed version of test_source_code.py.

That sounds like a reasonable plan. I'm happy to chat about test_source_code.py or just see what you come up with 🙂

It might help us discover dangerous patterns to simply run the tests under Python 3.5.

test/test_distributions.py

fritzo

Generally nice simplification, and clever probing logic!

It would be nice if we could execute the probing logic on library load e.g. via a list of tensor inputs for each distribution.

fritzo · 2020-03-25T19:51:03Z

funsor/distributions.py

-class BernoulliLogits(Distribution):
-    """
-    Wraps :class:`pyro.distributions.Bernoulli` .
+class __BernoulliProbs(dist.Bernoulli):


Is there a reason you're using double underscore here? I try to avoid double underscore because it makes debugging difficult due to Python's name mangling (double underscores are generally avoided for that reason).

I changed it to avoid confusion in the wrapping logic with any custom distribution types (like Pyro's _Subsample) that have a _ prefix for some reason, but I'll change it to something other than double underscore.

fritzo · 2020-03-25T19:52:09Z

funsor/distributions.py

+]
+
+for pyro_dist_class, param_names in _wrapped_pyro_dists:
+    locals()[pyro_dist_class.__name__.split("__")[-1].split(".")[-1]] = make_dist(pyro_dist_class, param_names)


ditto, could you use a single underscore and .lstrip("_")?

fritzo · 2020-03-25T20:08:47Z

funsor/distributions.py

+Delta._infer_value_domain = classmethod(lambda cls, **kwargs: kwargs['v'])
+
+
+# Multinomial and related dists have dependent bint dtypes, so we just make them 'real'


Good point, would you mind pointing to this new issue so we can start planning refactoring?
#322

fritzo · 2020-03-25T20:09:37Z

funsor/distributions.py

@@ -63,7 +66,7 @@ class Distribution(Funsor, metaclass=DistributionMeta):
        funsors or objects that can be coerced to funsors via
        :func:`~funsor.terms.to_funsor` . See derived classes for details.
    """
-    dist_class = "defined by derived classes"
+    dist_class = dist.Distribution


Is this value actually used, or are you merely tidying up to be well-typed?

Just tidying up to be well-typed.

fritzo

LGTM after minor nits. Thanks for your patience.

eb8680 added 30 commits February 11, 2020 11:55

add an inputs argument to to_funsor

3b13e27

implement funsor_to_tensor and tensor_to_funsor

bc56c2e

use to_funsor and to_data in funsor.pyro.convert

fa04123

nit

0951569

tweak

84d9fc0

attempt at generic distribution conversion

a32566f

address comment

6be8a34

remove domains from dim_to_name

630abf5

add dim_to_name docstring comment

bbbdafa

assert batch dim negativity

01cde8f

consider even named dims of size 1 empty in tensor_to_funsor

1a3a88f

Merge branch 'to-funsor-inputs' into to-funsor-distributions

be04eb1

sketch new distribution wrapper

4956775

split new version into second file

8dc8585

lint

f84a031

Merge branch 'master' into to-funsor-distributions

716a483

most basic beta density test passes

809e44c

basic density tests pass

ed3543c

tweak generic to_funsor/to_data implementations

0d84d09

standardize test

06e51d4

check event shape in to_data

55805b9

add metaclass to handle default name

f1f9e0e

add a to_funsor test for normal

b5ac620

add makefun to dependencies

2030536

add more to_funsor sketches

3c1f249

shuffle code around

c913eeb

port patterns and add tests

343bcf1

switch to refactoring distributions without changing the api

fb82ee7

fix density and incorrect validation

87d2fda

binomial and multinomial tests passing

3f20359

fehiepsi reviewed Mar 3, 2020

View reviewed changes

funsor/distributions.py Outdated Show resolved Hide resolved

funsor/distributions.py Outdated Show resolved Hide resolved

eb8680 mentioned this pull request Mar 4, 2020

Add generic to_funsor conversion methods for funsor.distributions #321

Merged