Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve ranges in indexing with IdentityUnitRange(::Base.OneTo) #211

Merged
merged 21 commits into from
Apr 29, 2021

Conversation

jishnub
Copy link
Member

@jishnub jishnub commented Feb 25, 2021

This PR changes the type in indexing AbstractRanges with IdentityUnitRange(::Base.OneTo). This is technically breaking so I've separated this from #210 . This may go into a minor release.

The reason this PR is necessary is that currently OffsetArrays pirates the Base behavior and changes the return type (although not the values/axes).

After #210:

julia> r = 1:2:10
1:2:9

julia> s = Base.IdentityUnitRange(axes(r,1));

julia> r[s]
1:2:9

julia> r[s] |> typeof
StepRange{Int64,Int64}

julia> using OffsetArrays

julia> r[s]
1:2:9 with indices 1:5

julia> r[s] |> typeof
OffsetArray{Int64,1,StepRange{Int64,Int64}}

After this PR:

julia> using OffsetArrays

julia> r[s]
1:2:9

julia> r[s] |> typeof
StepRange{Int64,Int64}

Despite the piracy, the indexing operation returns the same type as Base. Importantly this returns an AbstractRange as opposed to an OffsetVector.

@codecov
Copy link

codecov bot commented Feb 25, 2021

Codecov Report

Merging #211 (bd0c993) into master (b8753a5) will increase coverage by 0.11%.
The diff coverage is 98.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #211      +/-   ##
==========================================
+ Coverage   94.57%   94.69%   +0.11%     
==========================================
  Files           5        5              
  Lines         332      358      +26     
==========================================
+ Hits          314      339      +25     
- Misses         18       19       +1     
Impacted Files Coverage Δ
src/utils.jl 96.96% <90.00%> (-3.04%) ⬇️
src/OffsetArrays.jl 98.34% <100.00%> (+0.13%) ⬆️
src/axes.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8753a5...bd0c993. Read the comment docs.

@timholy
Copy link
Member

timholy commented Feb 26, 2021

Just to clarify, this is breaking only in the sense that different types are returned, right? As you know, I'm generally in favor of thinking of types as an implementation detail so that kind of "breakage" is OK if it doesn't really change behavior.

@jishnub
Copy link
Member Author

jishnub commented Feb 26, 2021

Yes this only changes the type of the returned object, there is no change in the axes or the values. While this is not breaking in that sense, it might cause some overly-specialized functions to falter. Perhaps it's better to be conservative and make this a part of 1.7 while #210 can go into 1.6.x?

In general it seems like a good strategy to push changes in types to minor releases. This way a compat entry that restricts the minor version would receive (mostly) non-breaking patches. There might be some permissible breakage in minor releases due to changes that are not considered a part of the API, whereas API-breaking changes go to major releases. This avoids patch releases breaking something as in #163

@jishnub jishnub force-pushed the retainrangesindexing branch 2 times, most recently from 08c17f3 to fcbfbba Compare February 27, 2021 05:51
@jishnub
Copy link
Member Author

jishnub commented Feb 27, 2021

The latest change in da226f7 is another step in preserving AbstractUnitRanges in vector indexing operations involving OffsetUnitRanges, as these may be used further in indexing, and indexing is more efficient with AbstractUnitRanges than with OffsetUnitRanges.

As a consequence, before:

julia> a = ones(200);

julia> r = OffsetVector(2:500, 2);

julia> s = OffsetVector(3:150, 3);

julia> @btime $a[$r[$s]];
  639.861 ns (1 allocation: 1.33 KiB)

Now:

julia> @btime $a[$r[$s]];
  356.967 ns (1 allocation: 1.33 KiB)

Some of the tests had to be made a bit less strict, as we need to test for axes and values and not necessarily for types.

Edit: well this isn't working anymore, now a[r[s]] is much slower than a[(r[s])]. Not altogether sure why it was working earlier.

@jishnub
Copy link
Member Author

jishnub commented Feb 27, 2021

Main change introduced in c3ef299: to_indices specialized for OffsetUnitRange{<:Integers}s by replacing them with IdOffsetRanges in an indexing operation.

Previously:

julia> a = ones(200);

julia> s  = OffsetVector(3:150, 3);

julia> @btime $a[$(parent(s))];
  205.697 ns (1 allocation: 1.33 KiB)

julia> @btime $a[$s];
  638.503 ns (1 allocation: 1.33 KiB)

After this

julia> @btime $a[$s];
  303.116 ns (1 allocation: 1.33 KiB)

The indexing performance becomes comparable to that of the parent

@jishnub
Copy link
Member Author

jishnub commented Feb 27, 2021

We're back to

julia> @btime $a[$r[$s]];
  358.557 ns (1 allocation: 1.33 KiB)

julia> @btime $a[$(r[s])];
  346.858 ns (1 allocation: 1.33 KiB)

The benchmark for a[s] listed earlier included this bounds checking as well. Now that we have checkindex defined for OffsetRanges, here are fresh benchmarks:

Without checkindex and to_indices:

julia> a = ones(2000);

julia> s = OffsetVector(axes(a,1), 0);

julia> @btime $a[$s];
  6.659 μs (1 allocation: 15.75 KiB)

Without to_indices but with checkindex:

julia> @btime $a[$s];
  3.735 μs (1 allocation: 15.75 KiB)

With both:

julia> @btime $a[$s];
  2.848 μs (1 allocation: 15.75 KiB)

The difference after adding to_indices is not huge, as primarily it was bounds-checking that was expensive. However there is still a 20% difference, so perhaps adding to_indices is worth it.

vectorindexing(a, s) = a[s]
nestedvectorindexing(a, r, s) = a[r[s]]

macro showbenchmark(ex)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added to avoid cluttering the display with line numbers


#= Integer UnitRanges may return an appropriate AbstractUnitRange{<:Integer}, as the result may be used in indexing, and
indexing is faster with ranges =#
@eval @propagate_inbounds function Base.getindex(r::UnitRange{<:Integer}, s::$OR)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary anymore as _maybewrapoffset already handles this case

@@ -72,10 +72,10 @@ function _checkindices(N::Integer, indices, label)
N == length(indices) || throw_argumenterror(N, indices, label)
end

_maybewrapaxes(A::AbstractVector, ::Base.OneTo) = no_offset_view(A)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_maybewrapaxes is effectively equivalent to _maybewrapoffset, so there's no need for this

@test axes(r1) == (IdentityUnitRange(-5:-4),)
@test parent(r1) === 8:9
@test axes(r1) == (-5:-4,)
@test no_offset_view(r1) == 8:9
Copy link
Member Author

@jishnub jishnub Feb 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indexing operation returns an IdOffsetRange now, so the parent might not be equal to the expected range. We check the values instead using no_offset_view and the axes separately

@jishnub
Copy link
Member Author

jishnub commented Feb 28, 2021

There is still some performance hit in Cartesian indexing:

julia> a = ones(2000, 2000);

julia> s = OffsetVector(axes(a,1), 0);

julia> @btime $a[$(parent(s)), $(parent(s))]; # no offset, so this is fast
  7.548 ms (2 allocations: 30.52 MiB)

julia> @btime $a[$(parent(s)), $s]; # not sure why this is fast but the others aren't?
  7.586 ms (2 allocations: 30.52 MiB)

julia> @btime $a[$s, $(parent(s))];
  7.841 ms (2 allocations: 30.52 MiB)

julia> @btime $a[$s, $s];
  7.875 ms (5 allocations: 30.52 MiB)

julia> sior = IdOffsetRange(parent(s));

julia> @btime $a[$sior, $sior]; # this is fast again
  7.480 ms (2 allocations: 30.52 MiB)

@jishnub
Copy link
Member Author

jishnub commented Feb 28, 2021

Inlining to_indices got rid of the Cartesian indexing performance hit.

Now

julia> @btime $a[$(parent(s)), $(parent(s))];
  6.755 ms (2 allocations: 30.52 MiB)

julia> @btime $a[$s, $s];
  6.718 ms (5 allocations: 30.52 MiB)

However, oddly it seems nested linear indexing is faster if ::OffsetUnitRange[::OffsetUnitRange] returns an OffsetRange rather than an IdOffsetRange. I have reverted the indexing behavior to return OffsetRanges for now.

julia> a = ones(2000);

julia> r = OffsetVector(1:2000, 0);

julia> s = OffsetVector(axes(a,1), 0);

julia> @btime $a[$r[$s]]; # r[s] Returns an OffsetRange
  3.465 μs (1 allocation: 15.75 KiB)

julia> @btime $a[$r[$s]]; # r[s] Returns an IdOffsetRange
  4.331 μs (1 allocation: 15.75 KiB)

I'm not sure what causes this hit, but in any case r[s] may return an OffsetRange, now that to_indices does the conversion to an IdOffsetRange internally. Will have to check some of the other return types and their performance impacts.

@jishnub
Copy link
Member Author

jishnub commented Feb 28, 2021

I think we're almost there after 72a5eaf. Now

julia> a = ones(2000);

julia> r = OffsetVector(1:2000, 0);

julia> s = OffsetVector(axes(a,1), 0);

julia> @btime $a[$r[$s]];
  2.103 μs (1 allocation: 15.75 KiB)

julia> @btime $a[$s];
  2.067 μs (1 allocation: 15.75 KiB)

To summarize (timings on Julia 1.5.3):

Using

julia> a = ones(2000); b = ones(1:2000); c = ones(2000, 2000);

julia> r = OffsetVector(1:2000, 0);

julia> s = OffsetVector(axes(a,1), 0);
Master This PR
a[s] 6.634 μs 2.090 μs
a[r[s]] 6.863 μs 2.074 μs
a[(r[s])] 7.184 μs 2.106 μs
b[axes(b,1)] 3.352 μs 2.194 μs
a[axes(b,1)] 3.747 μs 2.151 μs
c[s, s] 10.491 ms 6.850 ms

@jishnub
Copy link
Member Author

jishnub commented Mar 8, 2021

Worth waiting for JuliaLang/julia#39896 to restrict some of these methods to older versions of Julia

const IIUR = IdentityUnitRange{S} where S<:AbstractUnitRange{T} where T<:Integer

Base.step(a::OffsetRange) = step(parent(a))

@propagate_inbounds function Base.getindex(a::OffsetRange, r::OffsetRange)
OffsetArray(a.parent[r.parent .- a.offsets[1]], axes(r))
Base.checkindex(::Type{Bool}, inds::AbstractUnitRange, or::OffsetRange) = Base.checkindex(Bool, inds, parent(or))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem correct:

julia> ax = OffsetArrays.IdOffsetRange(Base.OneTo(5), -3)
OffsetArrays.IdOffsetRange(values=-2:2, indices=-2:2)

julia> axes(ax) == axes(ax.parent)
false

Copy link
Member Author

@jishnub jishnub Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. checkindex does not use the axes, it checks that the first and last values are contained in inds, and the first and last values of an OffsetRange should be the same as those of its parent. It's an OffsetRange btw (which is an OffsetArray wrapped around an AbstractRange), and not an IdOffsetRange.

Perhaps I'm missing something, could you expand on that example to illustrate the problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, sorry!

Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, LGTM. I like limiting the number of methods where we can, so making it version-dependent seems like a great idea.

const IIUR = IdentityUnitRange{S} where S<:AbstractUnitRange{T} where T<:Integer

Base.step(a::OffsetRange) = step(parent(a))

@propagate_inbounds function Base.getindex(a::OffsetRange, r::OffsetRange)
OffsetArray(a.parent[r.parent .- a.offsets[1]], axes(r))
Base.checkindex(::Type{Bool}, inds::AbstractUnitRange, or::OffsetRange) = Base.checkindex(Bool, inds, parent(or))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, sorry!

@jishnub jishnub merged commit cca160c into JuliaArrays:master Apr 29, 2021
@jishnub jishnub deleted the retainrangesindexing branch April 29, 2021 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants