Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated use of a dimension #46

Closed
ivirshup opened this issue Jan 30, 2020 · 4 comments · Fixed by #51
Closed

Repeated use of a dimension #46

ivirshup opened this issue Jan 30, 2020 · 4 comments · Fixed by #51

Comments

@ivirshup
Copy link
Contributor

ivirshup commented Jan 30, 2020

For context, I would like to be able to have labelled arrays which represent pairwise measures. These measures can be distances, or they could be adjacency matrices for some network structure.

For these pairwise measures, I think it makes sense for the resulting array to use the same dimension twice. As an example:

using Dates, DimensionalData
using DimensionalData: Time, X
using Distances

timespan = DateTime(2001):Month(1):DateTime(2001,12)
A = DimensionalArray(rand(12,10), (Time(timespan), X(10:10:100))) 
D = DimensionalArray(  # Since the dimensions don't propagate
    pairwise(CosineDist(), A),
    (X(10:10:100), X(10:10:100))
)

Subsetting by dimension results in both axes being subset:

D[X<|Between(0, 50)]
# DimensionalArray with dimensions:
#  X: 10:10:50
#  X: 10:10:50
# and data: 5×5 Array{Float64,2}:
#  0.0       0.321451  0.352489  0.398895  0.444586
#  0.321451  0.0       0.160367  0.193874  0.143861
#  0.352489  0.160367  0.0       0.232279  0.111173
#  0.398895  0.193874  0.232279  0.0       0.247832
#  0.444586  0.143861  0.111173  0.247832  0.0  

If I try to look at the distance between 10 and 20, I get the wrong result (both axes are being indexed by the first indexer):

D[X<|At(10), X<|At(20)]
# 0.0

For my work, there would be a lot of value in being able to use a dimension multiple times for one array, and being able to apply different selections to those indices. However I'm not sure how this can be resolved without losing the "order of indices doesn't matter" feature here.

Can these be reconciled? If not, I don't think it'd that large of a restriction to make order matter when indices are repeated, or just in general. Having the order be important makes the performance implications of the code be more explicit.


Using the order of the selectors to specify axis does is a workaround when dimensions are repeated:

 D[2:3, 1:2] == D[Between(20, 30), Between(10, 20)]

As an aside, this took me a little bit to find out, as I expected this to work initially:

julia> D[Between(20, 30), :]
ERROR: ArgumentError: invalid index: Between{Tuple{Int64,Int64}}((20, 30)) of type Between{Tuple{Int64,Int64}}
@rafaqz
Copy link
Owner

rafaqz commented Jan 30, 2020

Yes that is currently undefined behaviour.

Dimensions are found independently so they don't know about each other, so the first one is found both times. We would have to rethink how sortdims works to fix that, but it's doable.

The major conceptual change you are asking for is to have a syntax where written dimension order actually matters. Currently getindex on AbstractDimension indices isn't meant to convey any information about dimension order. But now the first index X has to refer to the first X dimension, the second to the second.

Does that make sense?

@ivirshup
Copy link
Contributor Author

Does that make sense?

I think so, though I can't say I'm 100% on the different type names yet 😄

The major conceptual change you are asking for is to have a syntax where written dimension order actually matters. Currently getindex on AbstractDimension indices isn't meant to convey any information about dimension order.

When I came across this, I wasn't aware this syntax already exists the non-dim-wrapper syntax. When I hit errors with that, I assumed the issue was the repeated dimensions. It turns out that Selectors can't be mixed with positional indexing and all axes have to be specified.

This makes having a syntax for filtering on all matching dimensions make sense, but the result for D[X<|At(20), X<|At(30)] still seems wrong. I would at least expect a bounds error.

I'm not sure I think that it makes sense to overload getindex with a method that largely has non-positional arguments. I wonder if a different syntax might make more sense for this? Something like select(D, X<|Between(30))? To me, this feels a little more like a database/ table operation than an array indexing operation.

@rafaqz
Copy link
Owner

rafaqz commented Jan 30, 2020

It turns out that Selectors can't be mixed with positional indexing and all axes have to be specified.

Because selectors are positional and dimensions aren't. If you wrap the selectors in dimensions, you don't need the positions because the dims will sort them automatically then unwrap, and missing dims will be filled with Colon. But you can't have both in the same call, they are actually different implementations of getindex/setindex/view.

I'm not sure I think that it makes sense to overload getindex with a method that largely has non-positional arguments. I wonder if a different syntax might make more sense for this? Something like select(D, X<|Between(30))? To me, this feels a little more like a database/ table operation than an array indexing operation.

I see you point, but that ship has sailed unfortunately! This is also what AxisArrays and most similar packages do, so there is a strong precedent for using getindex this way. Your proposal mixes some order back into things as well, which I'm not at all against. We just have to write the code.

I would like to get rid of the @generated permutedims method, so we could do this with a recursive function over both lookup and dim tuples that removes dims as they are sorted - so you can't use the same one twice. We just have to make sure that it compiles away. But it should solve all the problems mentioned here and not impact other code at all.

Edit: this method. I'm not sure when I will have time, so have a go if you want

https://github.com/rafaqz/DimensionalData.jl/blob/master/src/primitives.jl#L26-L51

@rafaqz
Copy link
Owner

rafaqz commented Jan 31, 2020

Don't worry about having a go this, it's mostly done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants