-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gripes and anti-gripes about VarInfo #119
Comments
As to redudancy, I'm perfectly fine with things as they are. "Redundant" code (if that's what it is, which I am not convinced) is not by itself a bad. It's not worth it to find some ultra-Julian way to compress the indexing functionality just because the number of lines is large. VarInfo works really well, and that comes at the cost of complexity. I think we should just focus on functionality and revisit "redundancy" if it ever actually starts to become a non-cosmetic issue. As to the API, I do not find that I run into cases where various API calls are repeated, or that I am unclear as to which API function to use, so I think the API is more or less fine. The docstrings are actually quite comprehensive. As to new API, the only thing I'd like to see done by smarter people than me is:
On testing: The Turing test issue is hard, because it's not obvious to me that there's a super easy solution. Perhaps one way is just to add a script that pulls in the Turing master into the test folder and runs the build from there, though this may have some unintended consequences and build issues. |
First of all, I don't think we should strive for a small number of locs just for the sake of it. IMO the ultimate goal is a reasonable balance of readability/clarity, simplicity, composability, functionality, and performance (maybe I forgot to list something here, but at least I think all these aspects are important). Moreover, the current implementation seems to work reasonably well, so every change should be evaluated carefully and I think there's no need to rewrite everything completely. Maybe it would be even better to add alternative approaches and ideas as separate That being said, probably you know that I like to throw around ideas that might not be completely useful or reasonable in the end, so here they are 😄 IMO a major problem is that there is no clearly defined API right now (as discussed previously), which makes it difficult to actually come up with alternative To me it seems with the current version of
Additionally, as @cpfiffer mentioned above, the API of DynamicPPL should also provide some convenience functions for, e.g., retrieving the values (in the original shape!) as NamedTuple or retrieving the name of the variables. Also ideally DynamicPPL should provide the functionality of As mentioned above, currently it is quite confusing to see what the difference actually is between
The motivation for not storing samples in a linearized form is not simplicity. Maybe the code will become more complicated, maybe it will be less complicated, I don't know. The main motivation is that linearization destroys information (such as shape) that is present in the samples and has to be reconstructed carefully and with a lot of heuristics currently. Moreover, it requires promotion of different types which is, e.g., weird in the case of both discrete and continuous variables. Samplers such as HMC that operate on vectors can use a method such as That leads me to another point.
To me it seems some of the design of AbstractVarInfo originates from having specific samplers in mind. I think ideally AbstractVarInfo should be sampler agnostic. So quite drastically, if gids are needed by the sampler in the next step, it should be part of the state of the sampler (and by that I mean it should be part of the transition - as discussed in AbstractMCMC, I don't think samplers should have a mutable state since that leads to abstract fields and weird dummy initializations e.g. in the HMC sampler). So I think ideally we should not actually handle gids. Along the same lines, I'm not sure if the concept of transformed and untransformed samples should be handled on the level of AbstractVarInfo. A sampler operates in transformed or untransformed space, and hence it should know best how the values should be handled. At the beginning of sampling (and in every Gibbs step), it would just make sure that the variables are in the correct space, and at the end of sampling (and the end of every Gibbs step) it would make sure that they are in the original space. Regarding the test structure: If the interface should be tested with specific samplers, we could have a simple internal implementation of these samplers, similar to https://github.com/SciML/DiffEqBase.jl/blob/master/src/internal_euler.jl, in addition to downstream tests. Downstream tests could be handled similarly to https://github.com/SciML/DiffEqBase.jl/blob/139e5ce266fa41ffd81508ba1c632e88b0118e58/test/runtests.jl#L32-L36 by including them in a separate environment to which one could temporarily add a Manifest.toml file for tests against a specific branch or version of, e.g., Turing. |
There are a lot of good points here. I'll add some of my thoughts too. I think the lack of a clear set of compulsory APIs (as mentioned above) for AbstractVarInfo ... UntypedVarInfo ... ThreadedVarInfo ... AbstractTypedVarInfo ...... UnonstrainedTypedVarInfo ...... ConstrainedTypedVarInfo ...CompositionalVarInfo We can have a separate file Some existing complexity around constrained-space parameters and unconstrained-space parameters can be distinguished by introducing additional types Another helpful feature that we currently lack is some granularity control for storing variables into As for some of the internal functions, like It is possible that we can leverage parametric types to reduce the total number of |
So the different VarInfo types idea is similar to the To accommodate the unvectorized representation, we may use a dictionary in the We can also define a For now, I will try to fix dynamic model support and stochastic control flow in branches. Then I will come back to refactoring VarInfo after NeurIPS. |
@devmotion getting rid of gids from VarInfo sounds interesting but this needs more thinking. So currently we know which variable belongs to which sampler using the gids, but we also know it from the space of the Sampler. So this might be an actual redundancy. I assume this redundancy was introduced to enable 1 variable symbol to be sampled by 2 different samplers in a Gibbs context. But currently, I don't think we do this. Even if that's needed, it might make more sense to store which |
I tried to summarize the relevant ideas and arguments from here in the AbstractPPL.jl discussion, but I dare not close this issue yet, as the information content is pretty dense. Would everyone involved please have another look that nothing important is lost in transfer over there? |
It's probably ok to keep this open and revisit after we finish the refactoring of |
Closed in favour of AbstractPPL.jl discussion, #68 and #309 |
While working on #115, I came to realize that we may be too harsh in our criticism of VarInfo's complexity. This issue tries to explain some of the design choices we have in varinfo.jl which should spiral into a discussion on how to improve things. This issue came to being after a discussion with Hong on slack regarding #118 which I am still very much in favor of.
Function redundancy is a big concern you may have if you have played around with varinfo.jl for a while. The main one I can think of is
getval
vsgetindex
andsetval!
vssetindex!
. The way I think about it is thatgetval
andsetval!
get and set the vectorized version of the value. Sogetval
returns a vector andsetval!
expects a vector in even if the value was originally a scalar or matrix.getindex
andsetindex!
on the other hand should reconstruct the shape of the original variable, andlink
/invlink
the value based on whether theVarInfo
is working in terms of the transformed or original values. Currently,setindex!
doesn't do exactly that and but it was "fixed" in #115 to do exactly the above. Sor = vi[vn]
(orvi[vn, dist]
in the PR) will always return a value in the domain ofvn
's distribution with the right shape andvi[vn] = r
expectsr
to be in the domain of the distribution and in its natural shape.Another redundancy that we have comes from the need for
r = vi[vn]
andvi[vn] = r
but alsor = vi[spl]
andvi[spl] = r
. The first 2 are used in a model call, while the other 2 are used in thestep!
function. These need to be defined forUntypedVarInfo
as well asTypedVarInfo
. But imo not much can be done about the need for those functions.VarInfo
needs to define those functions one way or another because they are the main API ofVarInfo
.Other than the above redundancy, we have many getter and setter functions in varinfo.jl. I think each one of those functions is used at least once. For example, we need to be able to update the
gids
of a variable to assign a sampler to it. This is because new variables can pop up in dynamic models and they need to be assigned to HMC for example in the next HMC call. Getters and setters for variable ranges, distributions, logp and flags are all necessary. Other functions likeempty!
,haskey
,syms
,tonamedtuple
, etc are all important utility functions to have as well.Finally regarding unit tests, while we are not exactly perfect in that department, our coverage is at 75%. The coverage of varinfo.jl is at 83%. We could definitely do better in terms of splitting Turing and DPPL tests but completely separating the 2 has proven somewhat difficult thus far. This is mostly because we need a Gibbs-HMC sampler to test the Gibbs-HMC specific components of DPPL. We need a particle sampler to test the particle sampling specific components of DPPL, etc. Will it be possible one day? I hope but this isn't the first heavily interlinked package duo that exists in the Julia ecosystem. Sometimes the need arises for 2 separate development cycles in a heavily interlinked package resulting in the splitting of the package into 2 heavily interlinked packages with 2 development cycles. It's not ideal but it's arguably better than combining the 2 packages again mostly for development needs. This is exactly the situation we are in the Turing-DPPL situation.
Now, back to why I am opening this issue. This isn't just me defending the status quo. I understand there are a few similar issues discussing problems or proposed improvements to VarInfo, e.g. #5, #7, #16, #18 and #68. I am open to criticism if you disagree with anything I said above or think there is a better way to do things but please be specific. For example, here are some questions for you to think about:
@devmotion has brought up more than once his preference to design the
VarInfo
data structure around unvectorized values. I understand the appeal of that but I don't think it will necessarily simplify the code by a lot. We still need to keep track of distributions, sampler gids and varnames. We still need to get a vectorized form and set a vectorized form of the values for HMC samplers. We still need to cater for variables disappearing and popping at any time. We still need specialize the type of VarInfo to cater for mixed variable types and automatic differentiation in a type stable way. All of this would still need to be done. Will it be simpler? Maybe, but I doubt it will be much simpler.Anyways, this issue grew to be longer than I would have liked, but please let me know what you think, whether you agree or disagree with anything I said above. Thank you.
The text was updated successfully, but these errors were encountered: