Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove overly specialized bundle_samples #120

Closed
wants to merge 4 commits into from

Conversation

torfjelde
Copy link
Member

@torfjelde torfjelde commented Mar 13, 2023

Related: #118

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a concrete example that is fixed by this removal? Removing this definitions seems very breaking. Without this method the return types will be incorrect, won't they? The main point is that the elements are converted to type T which won't happen anymore.

@torfjelde
Copy link
Member Author

Do you have a concrete example that is fixed by this removal? Removing this definitions seems very breaking. Without this method the return types will be incorrect, won't they? The main point is that the elements are converted to type T which won't happen anymore.

Nothing is fixed, it's just that

  1. I don't actually know of anywhere this is being used. Do we know of any place where this is the case?
  2. If you want to do something with Type{Vector{...}} then you need to override this default implementation. This sort of thing comes up very often once you have samplers which are working on a "multiple" models, e.g. if you implement tempering. In the case it seems very reasonable to allow Vector{MCMCChains.Chains} for samples::AbstractVector, which then requires explicit overriding of this default implementation.
  3. map + convert is very easy to do for the user themselves, to the point where it doesn't seem worth the hassle it causes 🤷

@codecov
Copy link

codecov bot commented Mar 13, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.03 ⚠️

Comparison is base (7192263) 97.33% compared to head (216488f) 97.30%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
- Coverage   97.33%   97.30%   -0.03%     
==========================================
  Files           8        8              
  Lines         300      297       -3     
==========================================
- Hits          292      289       -3     
  Misses          8        8              
Impacted Files Coverage Δ
src/interface.jl 94.11% <ø> (-0.89%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@torfjelde
Copy link
Member Author

Without this method the return types will be incorrect, won't they? The main point is that the elements are converted to type T which won't happen anymore.

And this is not even a promise we are making, right? For example, bundle_samples will currently just be identity if you provide it with a type it has no implementation for (this is a different question: should we error here?)

@torfjelde
Copy link
Member Author

An alternative is of course to introudce a

AbstractMCMC.bundle_samples_to_vector

or something, i.e. just an additional diversion so that overloading for samples::AbstractVector doesn't become such a hassle, which is then the default (and thus keeps the change non-breaking).

And let me try to clarify a bit the issue I have with the current approach.

If you want to overload AbstractMCMC.bundle_samples(::AbstractVector, ..., ::Type{T}) where T, and, say, just make some transformation to the samples before passing them on to the "actual" bundle_samples implementation (this is very common in the case of "meta-samplers", e.g. tempered samplers, composition of samplers), then you have to do the following

function AbstractMCMC.bundle_samples(samples, ..., ::Type{T}) where {T}
    return actual_bundle_samples(samples, ..., T)
end

function AbstractMCMC.bundle_samples(samples::Vector, ..., ::Type{Vector{T}}) where {T}
    return actual_bundle_samples(samples, ..., Vector{T})
end

which gets quite annoying.

@devmotion
Copy link
Member

I've been annoyed by downstream issues/annoyances of the bundle_samples interface as well. The reason why I'm a bit skeptic about the PR in its current form is that it is breaking (clearly indicated by the broken tests) and hence should be released in a breaking release, regardless of how useful the change is. But when we go through the hassle of updating all downstream packages anyway, I wonder if we should spend a few days on thinking about whether we could/should improve the interface more generally. For instance, currently it's not intended by the interface to dispatch on the output-type in the sample call - is that an issue? Is the output type useful at all? Should we remove bundle_samples completely, what would the implications be? Could/should we generally reduce the number of arguments in the interface API?

@torfjelde
Copy link
Member Author

Oooh AbstractMCMC is already on x.y.z versioning! I was thinking bumping minor version was a breaking release, and so was confused why you were talking about how this should be a breaking release thinking I'd already indicated so.

But when we go through the hassle of updating all downstream packages anyway, I wonder if we should spend a few days on thinking about whether we could/should improve the interface more generally. For instance, currently it's not intended by the interface to dispatch on the output-type in the sample call - is that an issue? Is the output type useful at all? Should we remove bundle_samples completely, what would the implications be? Could/should we generally reduce the number of arguments in the interface API?

And yeah, happy to spend a few days thinking aobut whether or not this is the way to go 👍

Should we maybe make an issue or discussion?

@torfjelde
Copy link
Member Author

currently it's not intended by the interface to dispatch on the output-type in the sample call - is that an issue?

Is the output type useful at all?

When you refer to the "output type in the sample call", are you talking about the samples argument of bundle_samples or the output-type specified to bundle_samples? I'm guessing the latter, but just asking to be sure.

And btw, a lot of these PRs are motivated by "annoyances" I've encountered when working on MCMCTempering.jl, where we have several "meta-samplers", i.e. samplers which call other sampler in some way.

In MCMCTempering.jl we have a MultiModel, which effectively defines a product of models but with an ordering (for implementation purposes only). In this case, it might be sensible for the user to ask for a Vector{MCMCChains.Chains}, with each chain corresponding to each model.

Another particularly "weird" case where bundle_samples is used (though not related to the dispatching on the output-type), though IMO it also seems like a decent way to handle this stuff this method: https://github.com/TuringLang/MCMCTempering.jl/blob/ed1ca9886d1c49aece23aacf2bc9fcaf77725fd5/src/MCMCTempering.jl#L93-L108

In this case, the samples correspond to samples from a product of models, and so it represents an ordered sequence of samples rather than a single one. But the sequence is ordered according to an "index process", which is not necessarily equal to the model-ordering (which is what you'd expect as a user). To "handle" this in a somewhat more convenient manner for the user, we've added a kwarg to bundle_samples called bundle_resolve_swaps which will automatically re-order the sequence in each sample to align with the provided models.

Should we remove bundle_samples completely, what would the implications be?

Do you mean completely removing it and just not doing anything with the samples, leaving that to the sampler-implementer? If so, I'm not a big fan, in particular not now that we have extensions and people can easily add impls for MCMCChains, etc.

Could/should we generally reduce the number of arguments in the interface API?

What we could always do, though it has to be done with care, is to add some simpler bundle_samples method which can be used if all the sampler-information isn't useful, i.e.

function bundle_samples(
    samples, ::AbstractModel, ::AbstractSampler, ::Any, output_type::Type; kwargs...
)
    return bundle_samples(samples, output_type; kwargs...)
end

and then just make the default implementation we complained about above

function bundle_samples(samples::Vector, ::Type{Vector{T}}; kwargs...) where {T}
    return map(samples) do sample
        convert(T, sample)
    end
end

I believe in most (but not all) scenarions, one would really only overload the latter implementation.

@torfjelde
Copy link
Member Author

@devmotion Could we revisit this?:)

@devmotion
Copy link
Member

I'm still unhappy. The least this should be moved to a breaking 5.0.0 release.

However, I wonder - could we just move the functionality to the body of the generic fallback, wrapped in a if samples isa Vector && T <: Vector (or similar) check to fix the method ambiguity issues without actually removing functionality?

@devmotion
Copy link
Member

Since #123 was merged, I suggest closing this PR for now and revisit it (or some variant of it) if these ambiguity issues show up again.

@torfjelde
Copy link
Member Author

Sounds good 👍

@torfjelde torfjelde closed this Jun 20, 2023
@devmotion devmotion deleted the torfjelde/remove-unnecessary-bundle-sample branch June 20, 2023 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants