Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One can ESTIMATE but not SIMULATE mutual information with the loom backend. #622

Open
Schaechtle opened this issue May 3, 2018 · 2 comments

Comments

@Schaechtle
Copy link
Collaborator

Note that this issue should probably have gone to our fork of loom; for some reason I don't have privileges to open issues there.

The problem: you can estimate mutual information with loom, but you can't simulate it because mi-values get aggregated by computing the mean in the loom backend already:
https://github.com/probcomp/loom/blob/32227b125d45f1435ff6e6f05df76b5161158bcf/loom/query.py#L281

This should be easy to fix (just remove the taking the mean) in the lines linked above.

@fsaad
Copy link
Collaborator

fsaad commented May 4, 2018

The lines quoted above:

        mi = entropys[feature_set1].mean \
            + entropys[feature_set2].mean \
            - entropys[feature_union].mean

are accessing an attribute of the Estimate namedtuple called mean, as opposed to invoking a method .mean() that takes the mean across an array.

It should be noted from this code that the mean here is representing the mean over the Monte Carlo samples of simulate/logpdf used to estimate the entropy, as opposed to the mean taken over the list of entropy values (one entropy value per CrossCat structure in the ensemble). Unfortunately, the latter quantity does not even appear to be directly exposed via the Loom C++ API.

After some investigation it appears that exposing the distribution of entropy (or probabilities, or samples) across the ensemble would need an alternative implementation of the QueryServer class defined in
query_server.hpp and implemented in query_server.cpp, which internally aggregates over all cross_cats in the Loom configuration directory. There probably is not much conceptual difficulty with writing a QueryServer that exposes lists of results as opposed to aggregates, but it would need a non-trivial amount of work (that includes new protocol buffer message definitions for Query in schema.proto, and of course getting the whole shebang to build).

@fsaad
Copy link
Collaborator

fsaad commented Oct 1, 2018

@Schaechtle Any updates on whether this has been implemented or whether its still needed to be implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants