Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking: add iterate and collect #811

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Breaking: add iterate and collect #811

wants to merge 4 commits into from

Conversation

rafaqz
Copy link
Owner

@rafaqz rafaqz commented Sep 30, 2024

@asinghvi17 this should do what you want

Closes #810

using DimensionalData
A = [1.0 2.0 3.0;
     4.0 5.0 6.0]
x, y, z = X([:a, :b]), Y(10.0:10.0:30.0; metadata=Dict()), Z()
dimz = x, y
da1 = DimArray(A, (x, y); name=:one, metadata=Metadata())
da2 = DimArray(Float32.(2A), (x, y); name=:two)
da3 = DimArray(Int.(3A), (x, y); name=:three)
da4 = DimArray(cat(4A, 5A, 6A, 7A; dims=3), (x, y, z); name=:extradim)

s = DimStack((da1, da2, da3))
mixed = DimStack(da1, da2, da4)
julia> [x for x in mixed]
╭──────────────────────────────────────────────────────────────────────────────╮
│ 2×3×4 DimArray{@NamedTuple{one::Float64, two::Float32, extradim::Float64},3} │
├──────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────── dims ┐
   X Categorical{Symbol} [:a, :b] ForwardOrdered,
   Y Sampled{Float64} 10.0:10.0:30.0 ForwardOrdered Regular Points,
  ↗ Z
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
[:, :, 1]
    10.0                                       20.0                                        30.0
  :a    (one = 1.0, two = 2.0, extradim = 4.0)     (one = 2.0, two = 4.0, extradim = 8.0)      (one = 3.0, two = 6.0, extradim = 12.0)
  :b    (one = 4.0, two = 8.0, extradim = 16.0)    (one = 5.0, two = 10.0, extradim = 20.0)    (one = 6.0, two = 12.0, extradim = 24.0)

julia> collect(mixed)
2×3×4 Array{@NamedTuple{one::Float64, two::Float32, extradim::Float64}, 3}:
[:, :, 1] =
 (one = 1.0, two = 2.0, extradim = 4.0)   (one = 2.0, two = 4.0, extradim = 8.0)    (one = 3.0, two = 6.0, extradim = 12.0)
 (one = 4.0, two = 8.0, extradim = 16.0)  (one = 5.0, two = 10.0, extradim = 20.0)  (one = 6.0, two = 12.0, extradim = 24.0)

[:, :, 2] =
 (one = 1.0, two = 2.0, extradim = 5.0)   (one = 2.0, two = 4.0, extradim = 10.0)   (one = 3.0, two = 6.0, extradim = 15.0)
 (one = 4.0, two = 8.0, extradim = 20.0)  (one = 5.0, two = 10.0, extradim = 25.0)  (one = 6.0, two = 12.0, extradim = 30.0)

[:, :, 3] =
 (one = 1.0, two = 2.0, extradim = 6.0)   (one = 2.0, two = 4.0, extradim = 12.0)   (one = 3.0, two = 6.0, extradim = 18.0)
 (one = 4.0, two = 8.0, extradim = 24.0)  (one = 5.0, two = 10.0, extradim = 30.0)  (one = 6.0, two = 12.0, extradim = 36.0)

[:, :, 4] =
 (one = 1.0, two = 2.0, extradim = 7.0)   (one = 2.0, two = 4.0, extradim = 14.0)   (one = 3.0, two = 6.0, extradim = 21.0)
 (one = 4.0, two = 8.0, extradim = 28.0)  (one = 5.0, two = 10.0, extradim = 35.0)  (one = 6.0, two = 12.0, extradim = 42.0)

julia> vec(mixed) # row table!
24-element Vector{@NamedTuple{one::Float64, two::Float32, extradim::Float64}}:
 (one = 1.0, two = 2.0, extradim = 4.0)
 (one = 4.0, two = 8.0, extradim = 16.0)
 (one = 2.0, two = 4.0, extradim = 8.0)
 (one = 5.0, two = 10.0, extradim = 20.0)
 (one = 3.0, two = 6.0, extradim = 12.0)
 (one = 6.0, two = 12.0, extradim = 24.0)
 (one = 1.0, two = 2.0, extradim = 5.0)
 (one = 4.0, two = 8.0, extradim = 20.0)
 (one = 2.0, two = 4.0, extradim = 10.0)
 (one = 5.0, two = 10.0, extradim = 25.0)
 (one = 3.0, two = 6.0, extradim = 15.0)
 
 (one = 4.0, two = 8.0, extradim = 24.0)
 (one = 2.0, two = 4.0, extradim = 12.0)
 (one = 5.0, two = 10.0, extradim = 30.0)
 (one = 3.0, two = 6.0, extradim = 18.0)
 (one = 6.0, two = 12.0, extradim = 36.0)
 (one = 1.0, two = 2.0, extradim = 7.0)
 (one = 4.0, two = 8.0, extradim = 28.0)
 (one = 2.0, two = 4.0, extradim = 14.0)
 (one = 5.0, two = 10.0, extradim = 35.0)
 (one = 3.0, two = 6.0, extradim = 21.0)
 (one = 6.0, two = 12.0, extradim = 42.0)

@rafaqz rafaqz changed the title add iterate and collect Breaking: add iterate and collect Sep 30, 2024
@rafaqz
Copy link
Owner Author

rafaqz commented Sep 30, 2024

The weirdest thing about this is you still splat layers into a NamedTuple rather than values - because we defined merge to do this.

Into a Tuple you splat values once we have iterate on values.

The Array/NamedTuple duality thing that AbstractDimStack is always creates some weirdness at the boundary, its just which weirdness is the most useful. Splatting values into a NamedTuple is never useful but spatting layers is really useful. But its still weird.

Copy link
Collaborator

@asinghvi17 asinghvi17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added some comments - not sure how you intend the API to work. I should be able to push my code as it is tomorrow, that will be a better example of the usecase.

Comment on lines +32 to +33
(::Type{T})(st::AbstractDimStack) where T<:AbstractDimArray =
T([st[D] for D in DimIndices(st)]; dims=dims(st), metadata=metadata(st))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that Raster(RasterStack(...)) is actually an expensive operation / materializes everything in memory? Is there a way we could avoid this, maybe a ConcatDimArray or something?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in this implementation. UnstackedDimArray or InterleavedDimArray might be a better name because its really interleaving the layers not concatenating.

I've also been thinking of just making AbstractDimStack an AbstractArray itself which would make that a moot point. This PR is a step in that direction.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now lets just make it expensive

Base.vec(st::AbstractDimStack) = vec(collect(st))
@propagate_inbounds Base.iterate(st::AbstractDimStack) = iterate(st, 1)
@propagate_inbounds Base.iterate(st::AbstractDimStack, i) =
i > length(st) ? nothing : (st[DimIndices(st)[i]], i + 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to hold off on re-instantiating the DimIndices object every time? Indexing with linear indices (single integers) currently does what I want anyway, it just needs to be forwarded correctly?

Copy link
Owner Author

@rafaqz rafaqz Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be free anyway. And if you index with linear indices it has to do DimIndices(A)[i] internally anyway!

Mixed dimension stacks indexing really only works properly now because Dimension indexing is used internally.

Comment on lines +205 to +207
Base.map(f, s::AbstractDimStack) = error("Use maplayers(f, stack)) instad of map(f, stack)")
Base.map(f, ::Union{AbstractDimStack,NamedTuple}, xs::Union{AbstractDimStack,NamedTuple}...) =
error("Use maplayers(f, stack, args...)) instad of map(f, stack, args...)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the cost...but should it be? Arguably if you expected iterate to work the way it will now, you also expected map to work that way...

Is it a common usage pattern to do eg map(stack) do array?

Copy link
Owner Author

@rafaqz rafaqz Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've meant to deprecate this for ages anyway. The worse cost is really how different splatting to a Tuple and a NamedTuple is.

Its a common pattern, but maplayers should be fine to use for it, and frees up map to later switch to work like iterate (we have to error in between for a whole breaking change though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operations that call collect or iterate fail on some RasterStacks
2 participants