Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix setindex! with SubDArray source #74

Closed
wants to merge 8 commits into from

Conversation

mbauman
Copy link
Contributor

@mbauman mbauman commented Jul 7, 2016

This method is an optimization wherein we try to chunk accesses based upon the parent DArray's parts. The hard thing is then going backwards and trying to figure out which parts of the assignment indices need to be used in order to access those chunks. This is a four stage process that uses five different types of indices:

  1. Find the indices of each portion of the DArray
  2. Find the valid subset of indices of the SubArray that index into that portion
  3. Find the portion of the indices for the assignment that need to be used for that subset of indices in step 2. This is the hard part. It requires creating another set of indices that represents the mask of valid indices from step 2. With those masks in hand, it's possible to reindex I to the indices we need. The trouble is that setindex! supports singleton dimensions in the source array in ways that getindex does not, so we need to selectively drop singleton dimensions as we restrict the indices. A final complication is that the last index can be a linear index over many indices in either the source or destination.
  4. Finally, if the entire DArray chunk isn't getting used, we need to shift the indices from step 2 to refer to the local part of the DArray.

Tests pass locally with --depwarn=no. CC @andreasnoack.

@coveralls
Copy link

coveralls commented Jul 7, 2016

Coverage Status

Coverage decreased (-19.9%) to 49.066% when pulling be6a31c on mbauman:mb/setindex into 1a31742 on JuliaParallel:master.

sz::NTuple{N,Int}
end
Base.size(M::MergedIndices) = M.sz
Base.getindex{_,N}(M::MergedIndices{_,N}, I::Vararg{Int, N}) = CartesianIndex(map(getindex, M.indices, I))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two parameter version of Vararg causes an error on 0.4. Is this fix strictly 0.5 material or is it possible to make it work on both versions?

@mbauman
Copy link
Contributor Author

mbauman commented Jul 8, 2016

No, I don't think this fix will be easy to backport. Both this and the previous implementation lean heavily on the assumption that length(S.indexes) = ndims(S.parent) for SubArrays. But that only became true on 0.5. So this doesn't really fix the bugs on 0.4 — they're a result of that assumption being false.

In general, setindex! does support a crazy amount of variation in the shapes of arrays it accepts:

A = rand(2,2)
A[:] = [1 2; 3 4] # Linear indexing into the destination
A[:,:] = 1:4 # Linear indexing into the source
A[:] = [1 2 3 4] # Skipping singleton dimensions with linear indexing into both
A[:, :] = [1 2 3 4] # Skipping singleton dimensions with linear indexing into the source

And it even feels like it should be more accepting than it currently is… but that'd just make this method even harder. See this comment and the following few ones: JuliaLang/julia#15431 (comment). On the other hand, the fact that it's hard to document its semantics is another red flag.

If we ever deprecate linear indexing, that would remove one of the possible sources of variation here. In fact, we could use the same trick I did in Base by reshaping the destination before doing the assignment — that'd probably be simpler. Nope, never mind. It's too early in the morning to be doing four levels of simultaneous indexing. This would only become simpler if we required the source array to exactly match the shape of the destination indices.

@codecov-io
Copy link

codecov-io commented Jul 8, 2016

Current coverage is 48.89% (diff: 42.50%)

Merging #74 into master will decrease coverage by 19.47%

@@             master        #74   diff @@
==========================================
  Files             1          1          
  Lines           702        589   -113   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits            480        288   -192   
- Misses          222        301    +79   
  Partials          0          0          

Powered by Codecov. Last update 166642e...54cbc66

@andreasnoack
Copy link
Member

Thanks a lot to our guest contributor. I think we'll have to declare defeat to old style indexing in Julia and remove support for 0.4 with this PR. People will still be able to use some of the DArray functionality on 0.4 but not the fixed version of this method. I'll push the necessary adjustments of REQUIRE and .travis.yml.

@andreasnoack
Copy link
Member

@timholy JuliaLang/julia#17137 broke this PR. It would be great if you could explain how the bounds checking code should be adjusted to work with new Base. Thanks.

@mbauman
Copy link
Contributor Author

mbauman commented Jul 9, 2016

I think I got it through some moderately informed monkey-see-monkey-do imitation of CartesianIndex. That said, I no longer understand the strategy for bounds checking or the difference between all these functions and methods.

@coveralls
Copy link

coveralls commented Jul 9, 2016

Coverage Status

Coverage decreased (-20.1%) to 48.896% when pulling 54cbc66 on mbauman:mb/setindex into 1a31742 on JuliaParallel:master.

@timholy
Copy link

timholy commented Jul 9, 2016

It seems a bit messed up now because of a bad interaction between JuliaLang/julia#17137 and JuliaLang/julia#17340, the latter of which got rid of any calls to _chkbounds, so that function isn't even used now.

I'll see what I can do.

This method is an optimization wherein we try to chunk accesses based upon the parent DArray's parts. The hard thing is then going backwards and trying to figure out which parts of the assignment indices need to be used in order to access those chunks.  This is a four stage process that uses five different types of indices:

1. Find the indices of each portion of the DArray
2. Find the valid subset of indices of the SubArray that index into that portion
3. Find the portion of the indices for the assignment that need to be used for that subset of indices in step 2. This is the hard part.  It requires creating another set of indices that represents the mask of valid indices from step 2.  With those masks in hand, it's possible to reindex `I` to the indices we need. The trouble is that `setindex!` supports singleton dimensions in the source array in ways that `getindex` does not, so we need to selectively drop singleton dimensions as we restrict the indices. A final complication is that the last index can be a linear index over many indices in either the source or destination.
4. Finally, if the entire DArray chunk isn't getting used, we need to shift the indices from step 2 to refer to the local part of the DArray.
This is no longer needed -- the comment is from when I only had restrict_indices partially implemented
Also clarify the comment since I was confused upon coming back to this method a few weeks later
Both these lazy arrays are effectively generalizations of Tim's MappedArrays.jl package. Doing this generally adds a bit more difficulty in terms of element types, but that is true of the MappedArray type, too.  It might be worth breaking this out into a package at some point.
As a further optimization, (at)inbounds could be added throughout the algorithm once it has received more widespread testing.
@mbauman
Copy link
Contributor Author

mbauman commented Aug 3, 2016

Updated to work with master

@andreasnoack
Copy link
Member

@mbauman Thanks a million. I've merged you commits in #76.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants