Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make unvetted size throw an error for arrays with non-1 indexing #17228

Merged
merged 5 commits into from
Jul 19, 2016

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Jul 1, 2016

For arrays that violate the assumption that indexing starts at 1, there's a risk that formerly-safe code could exit with nasty errors, see here. Discussion in #16260 led to a good suggestion for a migration strategy: during julia-0.5, size should throw an error by default for such "unconventional" arrays. This PR introduces the necessary infrastructure:

  • an IndicesSafety trait that can be passed as an argument through size and related functions
  • a macro @safeindices that you can use to wrap functions that you've certified are safe for arrays with unconventional indices.

Together with #17137, this closes #16973 and should be the end of array changes that I envision for 0.5. (It is expected that more functions will become @safeindices over 0.5.x releases as people increasingly use such unconventional arrays.)

# SafeIndices().
size( ::IndicesSafety, A::AbstractArray) = size(A)
size( ::IndicesSafety, A::AbstractArray, d) = size(A,d) # fixme
# size(s::IndicesSafety, A::AbstractArray, d) = d <= ndims(A) ? size(s, A)[d] : 1
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, uncommenting this line causes inference to bail during bootstrap. (Avoiding ndims and using the parameter variant, A::AbstractArray{T,N}, does not fix the problem either.)

As a consequence, people who write packages for unconventionally-indexed arrays have to specialize both size(s, A) and size(s, A, d), which is a little unfortunate but not too terrible.

@timholy timholy force-pushed the teh/safe_size_length branch 2 times, most recently from 26d84cd to 9faec4e Compare July 3, 2016 11:17
size( ::IndicesSafety, A) = size(A)
size( ::IndicesSafety, A::AbstractArray, d) = size(A,d) # fixme
# size(s::IndicesSafety, A::AbstractArray, d) = d <= ndims(A) ? size(s, A)[d] : 1
size{N}(s::IndicesSafety, A::AbstractArray, d1::Integer, d2::Integer, dx::Vararg{Integer, N}) = (size(s, A, d1), size(s, A, d2), ntuple(k->size(s, A, dx[k]), Val{N})...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap this?

@tkelman
Copy link
Contributor

tkelman commented Jul 3, 2016

I'm not following when SafeIndices() should be used vs UnsafeIndices() - could an explanation be written up in docs or devdocs?

@@ -1417,7 +1447,7 @@ function mapslices(f, A::AbstractArray, dims::AbstractVector)
idx[d] = Colon()
end

r1 = f(view(A, idx...))
r1 = f(A[idx...])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not mentioned in commit message, maybe should have a specific test

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll split this out into a separate PR.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 3, 2016

I'll modify the unmerged JuliaLang/www_old.julialang.org#386 when things settle out here. Bottom line is that neither is expected to be used in user-code; the idea is:

  • audit/fix your code for more general indexing
  • mark the function with @safeindices

Internally within Base, @safeindices doesn't work early in bootstrap.

@@ -577,6 +577,7 @@ export
rot180,
rotl90,
rotr90,
@safeindices,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs docstring inline and in rst, and probably a dedicated test of the macro?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add the docstring. The test happens as a consequence of the new @test_throws in offsetarray.jl and the fact that the remaining tests work.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 3, 2016

Test failure seems unrelated.

I'm tempted to merge. I hope people realize this PR makes the unconventionally-indexed arrays brittle-by-design: in practice, this forces you to look at basically every "unvetted" function you pass them to. I've modified the big stuff (indexing, broadcasting, show) and a number of smaller routines, but Base and definitely packages will, in the early days, require changes. I think this is the best approach, however, since uncontrolled failures are much worse.

s = op(s, f(Ai))
inds = linearindices(A)
n = length(inds)
@inbounds begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make an assumption that f(a) is bounds-safe?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inbounds doesn't propagate, even with inlining (unless so marked). We had inbounds previously for the n<16 case, but it missed several A[i] accesses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clearly need to watch https://www.youtube.com/watch?v=l0icRnmqeYs a few more times

@tkelman
Copy link
Contributor

tkelman commented Jul 3, 2016

Bottom line is that neither is expected to be used in user-code

Okay, then I guess devdocs is better, but in the current state it'll be easy to accidentally use the wrong one in base.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 3, 2016

Are you suggesting I move the blog post to devdocs? I can do that. I'll also clarify the meaning of the two alternatives.

@tkelman
Copy link
Contributor

tkelman commented Jul 3, 2016

I suppose that works. Would like to get out of the habit of merging things without accompanying documentation.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 3, 2016

OK, other than moving the blog post I think I've addressed all the review comments---and frankly I think it's best to do that once #17137 merges.

I'm having trouble building the docs locally:

tim@diva:~/src/julia-0.5$ make -C doc html
make: Entering directory '/home/tim/src/julia-0.5/doc'
. ./../deps/build/julia-env/bin/activate && pip install sphinx==1.3.1 \
              && pip install -r /home/tim/src/julia-0.5/doc/requirements.txt
Traceback (most recent call last):
  File "/home/tim/src/julia-0.5/deps/build/julia-env/bin/pip", line 7, in <module>
    from pip import main
  File "/home/tim/src/julia-0.5/deps/build/julia-env/local/lib/python2.7/site-packages/pip/__init__.py", line 14, in <module>
    from pip.utils import get_installed_distributions, get_prog
  File "/home/tim/src/julia-0.5/deps/build/julia-env/local/lib/python2.7/site-packages/pip/utils/__init__.py", line 27, in <module>
    from pip._vendor import pkg_resources
  File "/home/tim/src/julia-0.5/deps/build/julia-env/local/lib/python2.7/site-packages/pip/_vendor/pkg_resources/__init__.py", line 36, in <module>
    import plistlib
  File "/usr/lib/python2.7/plistlib.py", line 62, in <module>
    import datetime
ImportError: No module named datetime
Makefile:33: recipe for target '../deps/build/julia-env/bin/sphinx-build' failed
make: *** [../deps/build/julia-env/bin/sphinx-build] Error 1
make: Leaving directory '/home/tim/src/julia-0.5/doc'
tim@diva:~/src/julia-0.5$ locate datetime.py
/home/tim/.julia/v0.4/Conda/deps/usr/lib/python2.7/site-packages/numpy/core/tests/test_datetime.py
/home/tim/.julia/v0.4/Conda/deps/usr/lib/python2.7/site-packages/numpy/core/tests/test_datetime.pyc
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.10.2-py27_0/lib/python2.7/site-packages/numpy/core/tests/test_datetime.py
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.10.2-py27_0/lib/python2.7/site-packages/numpy/core/tests/test_datetime.pyc
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.10.4-py27_0/lib/python2.7/site-packages/numpy/core/tests/test_datetime.py
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.10.4-py27_0/lib/python2.7/site-packages/numpy/core/tests/test_datetime.pyc
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.10.4-py27_1/lib/python2.7/site-packages/numpy/core/tests/test_datetime.py
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.10.4-py27_1/lib/python2.7/site-packages/numpy/core/tests/test_datetime.pyc
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.11.0-py27_0/lib/python2.7/site-packages/numpy/core/tests/test_datetime.py
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.11.0-py27_0/lib/python2.7/site-packages/numpy/core/tests/test_datetime.pyc
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.11.0-py27_1/lib/python2.7/site-packages/numpy/core/tests/test_datetime.py
/home/tim/.julia/v0.4/Conda/deps/usr/pkgs/numpy-1.11.0-py27_1/lib/python2.7/site-packages/numpy/core/tests/test_datetime.pyc
/usr/lib/python2.7/dist-packages/numpy/core/tests/test_datetime.py
/usr/lib/python2.7/dist-packages/numpy/core/tests/test_datetime.pyc
/usr/lib/python2.7/dist-packages/zope/interface/common/idatetime.py
/usr/lib/python2.7/dist-packages/zope/interface/common/idatetime.pyc
/usr/lib/python2.7/dist-packages/zope/interface/common/tests/test_idatetime.py
/usr/lib/python2.7/dist-packages/zope/interface/common/tests/test_idatetime.pyc
/usr/lib/python3.5/datetime.py

If the solution is not obvious, perhaps someone else can build the docs for me and push it to this branch?

@tkelman
Copy link
Contributor

tkelman commented Jul 3, 2016

make -C deps distclean-virtualenv maybe? I was only able to get half of them to populate... the signature matching is picky, fix incoming

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 4, 2016

Thanks, @tkelman! I made a few more doc tweaks, and decided to make @safeindices extensible to user-functions.

Before merging, we should check performance: @nanosoldier runbenchmarks(ALL, vs=":master").


trailingsize(A, n) = trailingsize(UnsafeIndices(), A, n)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you able to build the docs? this may make these 2 separate code blocks?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't (but thanks for noticing that I forgot make julia-genstdlib).

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 4, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master")

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 4, 2016

Nanosoldier seems to be taking a well-deserved holiday.

@jrevels
Copy link
Member

jrevels commented Jul 5, 2016

Looks like there was a network issue while I was away, since the error in the server logs is getaddrinfo callback: system error (EAI_SYSTEM). I'm about to hop on a plane, but I'll investigate further when I land. In the meantime, I checked the relevant connections and things seem to be fine now, so I've restarted the server. Perhaps it'll work now:

@nanosoldier runbenchmarks(ALL, vs = ":master")

@jrevels
Copy link
Member

jrevels commented Jul 6, 2016

Ugh, master broke Nanosoldier's v0.5 version of JLD, which is why the build is failing. I'll update it and see if that helps. ref JuliaCI/BenchmarkTools.jl#15

@tkelman
Copy link
Contributor

tkelman commented Jul 7, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels


# newindexer(shape, A) generates `keep` (for use by `newindex` above)
# for a particular array `A`, given the broadcast_shape `shape`
# Equivalent to map(==, indices(A), shape) (but see #17126)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also equivalent to indices(A) .== shape or?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support that syntax for tuples (but conceptually, yes)

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 18, 2016

All review comments addressed.

_maybe_reshape(::LinearFast, A::AbstractArray, i) = A
_maybe_reshape(::LinearSlow, A::AbstractVector, i) = A
_maybe_reshape(::LinearSlow, A::AbstractArray, i) = _maybe_reshape(LinearSlow(), index_ndims(i), A)
_maybe_reshape{N}(::LinearIndexing, ::NTuple{N}, A) = _= reshape(A, Val{N})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= _= typo?

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 18, 2016

Set again (I hope!)

@tkelman
Copy link
Contributor

tkelman commented Jul 19, 2016

I don't entirely get what these various maybe_* functions are supposed to be doing or why they're needed.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 19, 2016

_maybe_reshape

This is extended from an older function from 51b4bc1, _maybe_linearize. The idea is that you might need/want to reshape the parent array before extracting values. Why and when? This is in support of partial linear indexing (which is still waiting for a volunteer to deprecate---you have other things to do 😉, it should be someone else). If you try to index a LinearSlow array with fewer indices than it has dimensions, it's going to have to call div which is always a performance nightmare. If you are just asking for a single value from the array, there's no better alternative to going ahead and doing it---and that's intercepted by this method, so this path doesn't get called in such cases. However, this path is used if you're asking for multiple values from the array (at least one of the indices is a vector or Colon). Because of #15357 and #15449 it will usually be worth it to first reshape the parent array to match the dimensionality of indices, so that you get the benefit of the fast multiplicative inverses when calculating div.

Since types are involved, it's important to do all this in a type-stable manner. So basically this implements the following logic in a way the compiler can follow: IF this is a LinearSlow array, AND the dimensionality of the indices doesn't match the dimensionality of the array, reshape the array to match the dimensionality of the indices. The computation of the dimensionality of the indices is actually a little tricky because of CartesianIndex, hence index_ndims.

maybe_oneto

This is to handle the opposite case from partial linear indexing, when the user supplies an extra index (e.g., indexing a vector as v[i,1]), and helps do the right thing when you've "run out of indices" in inds. I could have dispatched on Tuple{}, but since we're also specializing on the 2nd argument of decolon and index_shape_dim that gets to be annoying pretty quickly. Since IteratorsMD.split already handles that case well, I decided to use that instead, and repackage the empty tuple as an explicit index. But hmm, explaining this makes me realize this should just return 1 and not OneTo(1). Will fix. EDIT: nvm, I was right the first time, because the index is Colon() which needs to be converted to a Range.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 19, 2016

Just an FYI that from my perspective this is merge-worthy and hence closes a 0.5.0 milestone. Alternatively, I can await more reviews and perhaps pile on my promise in https://github.com/JuliaLang/julia/pull/17283/files/c68a2cd9919cdf4a23243b313670ad8dc41a9c27#r71298118.

@tkelman
Copy link
Contributor

tkelman commented Jul 19, 2016

let's do @nanosoldier runbenchmarks(ALL, vs = ":master") again just in case

@StefanKarpinski
Copy link
Sponsor Member

Assuming the nanosoldier run looks good, I think you should go ahead and merge, @timholy.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 19, 2016

More than good.

@timholy timholy merged commit c2e3494 into master Jul 19, 2016
@timholy timholy deleted the teh/safe_size_length branch July 19, 2016 21:47
@JeffBezanson
Copy link
Sponsor Member

Thanks @timholy, this approach looks good. Kudos especially for the excellent documentation.
(Just got back from vacation, which is why I haven't been following along for the last couple days.)

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 20, 2016

Hope you had a great vacation!

And thanks for the guidance---as usual, your taste is exquisite, and I just wasn't brave enough to throw a basic function like size under a bus. Thanks for helping me get to the right answer.

mbauman added a commit to mbauman/DistributedArrays.jl that referenced this pull request Aug 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Safe non-traditional array indexing
8 participants