Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster circshift! for SparseMatrixCSC #30317

Merged
merged 10 commits into from
Dec 25, 2018
Merged

Conversation

abraunst
Copy link
Contributor

@abraunst abraunst commented Dec 8, 2018

So this is another (much faster) version of the circshift! implementation for SparseMatrixCSC. Unfortunately, the code is much less legible than the two-liner of #30300, but

  • avoids allocations almost completely
  • has exactly two mod calls

In its current state, for SparseMatrixCSC matrices, it seems comparable to the dense version for small dense matrices, faster for larger ones and much much faster for both sparse and larger ones.

Question1: is it possible to replace the O .= similar(X) in the first line and just check/assert that allocated memory is compatible (i.e. colptr, rowval and nzval are of the same length and n,m are ==)? I suppose no, because it would be a breaking change. This would avoid a double allocation in the call from circshift. Otherwise, I suppose I could move everything to a helper function and implement circshift instead of relying on the generic one. If this is the way to go, I would replace O .= similar(X) by just allocating the memory (no need to copy colptr and rowval)

EDIT: I realized that this question is kind of stupid... I just resize! colptr, rowval and nzval -- this does not involve allocation if they are already of the right size. It also makes things work when the output has different type than the input -- which didn't work before. I added some tests that would have catched it.

Question2: even without O .= similar(X), there seems to be some very small allocation reported by @benchmark... I don't know where it comes from (but I may be missinterpreting).

I believe that this is a good improvement wrt to the current situation. The implementation is reasonably straightforward. Thoughts?

Updated benchmarks:

Summary of mean times, raw data here

x MASTER BRANCH
sprand(10,10,1.0) 1.211 μs 1.218 μs
sprand(10,10,0.1) 789.932 ns 765.545 ns
sprand(1000,1000,1.0) 71.192 ms 3.904 ms
sprand(1000,1000,0.1) 37.105 ms 196.435 μs
sprand(1000,1000,0.01) 11.689 ms 26.761 μs

Updated benchmarks with some @inbounds, raw data here

x MASTER BRANCH
sprand(10,10,1.0) 1.211 μs 1.172 μs
sprand(10,10,0.1) 789.932 ns 725.866 ns
sprand(1000,1000,1.0) 71.192 ms 3.656 ms
sprand(1000,1000,0.1) 37.105 ms 167.782 μs
sprand(1000,1000,0.01) 11.689 ms 20.948 μs

EDIT3: Found the solution to Question2 above: I think it the splat in the call to (dense) circshift! (in multidimensional.jl):

julia> x=rand(10); y=similar(x); @btime circshift!($y,$x,1);
  307.100 ns (3 allocations: 144 bytes)

julia> x=rand(10); y=similar(x); @btime circshift!($y,$x,(1,));
  35.272 ns (0 allocations: 0 bytes)

julia> @btime (1...,);
  263.556 ns (3 allocations: 144 bytes)

I thought that the temporary tuple would have been optimized out... can this be solved there?

For the moment, I will just solve it localy. This gives a huge improvement for small matrices!

Updated benchmarks, raw data here

x MASTER BRANCH
sprand(10,10,1.0) 1.211 μs 463.476 ns
sprand(10,10,0.1) 789.932 ns 110.585 ns
sprand(1000,1000,1.0) 71.192 ms 3.669 ms
sprand(1000,1000,0.1) 37.105 ms 171.386 μs
sprand(1000,1000,0.01) 11.689 ms 20.705 μs

@mauro3
Copy link
Contributor

mauro3 commented Dec 8, 2018

(Always quote code, in particular macros. Above you just pinged the github user "benchmark"!)

@abraunst
Copy link
Contributor Author

abraunst commented Dec 8, 2018

(Always quote code, in particular macros. Above you just pinged the github user "benchmark"!)

dang, I'm sorry about that. Just corrected it.

@abraunst abraunst changed the title implement circshift! for SparseMatrixCSC [WIP] implement circshift! for SparseMatrixCSC Dec 9, 2018
@abraunst abraunst changed the title [WIP] implement circshift! for SparseMatrixCSC Implement circshift! for SparseMatrixCSC Dec 9, 2018
@abraunst abraunst force-pushed the circshift2 branch 2 times, most recently from 30cbcab to 0fa85f1 Compare December 12, 2018 17:08
@kshyatt kshyatt added the sparse Sparse arrays label Dec 12, 2018
@abraunst abraunst changed the title Implement circshift! for SparseMatrixCSC faster circshift! for SparseMatrixCSC Dec 14, 2018
r = mod(r, X.m)
@inbounds for i=1:O.n
subvector_shifter!(O.rowval, O.nzval, O.colptr[i], O.colptr[i+1]-1, O.m, r)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip this loop if iszero(r). Similarly the code above can be replace with a copy if iszero(c).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @stevengj. I implemented the suggestions, let me know if I interpreted correctly. I also moved subvector_shifter! to sparsevector.jl, it seemed more appropriate (should I prepend the name with _ given that it's a helper, or better not, as it is used also by sparsematrix.jl?).

for i=1:20
m,n = 17,15
A = sprand(m, n, rand())
shifts = rand(-m:m), rand(-n:n)
Copy link
Member

@stevengj stevengj Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this deterministic to make sure we always exercise the corner cases.

    for rshift in (-1, 0, 1, 10), cshift in (-1, 0, 1, 10)
        shifts = (rshift, cshift)

and have a separate loop for the sparse vector case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stevengj. I've done it. I moved the sparse vector tests to test/sparsevector.jl (which already existed).

@@ -2008,7 +2008,7 @@ end


function circshift!(O::SparseVector, X::SparseVector, (r,)::Base.DimsInteger{1})
copy!(O, X)
O .= X
Copy link
Member

@stevengj stevengj Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a bug in copy! for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I opened a separate issue #30443

@ViralBShah
Copy link
Member

@stevengj Can you merge it when you think it is ready?

@stevengj
Copy link
Member

Are the 32-bit appveyor failures unrelated?

@abraunst
Copy link
Contributor Author

Are the 32-bit appveyor failures unrelated?

Seem unrelated to me. OTOH, I have no idea of what they are :-)

@stevengj
Copy link
Member

Maybe rebase and force-push to see if the appveyor problem is something that was recently fixed.

@abraunst
Copy link
Contributor Author

abraunst commented Dec 24, 2018

Maybe rebase and force-push to see if the appveyor problem is something that was recently fixed.

Sure. Should I squash everything into a single commit while I am it?

@stevengj stevengj merged commit 94993e9 into JuliaLang:master Dec 25, 2018
@stevengj
Copy link
Member

No need to squash — github allows us to squash when merging. Thanks!

@abraunst abraunst deleted the circshift2 branch December 25, 2018 15:43
staticfloat pushed a commit that referenced this pull request Dec 30, 2018
* implement circshift! for SparseMatrixCSC

* factor helper function shifter!, implement efficient circshift! for SparseVector

* add some @inbounds for improved performance

* remove allocations completely, giving a large improvement for small matrices

* some renaming to avoid polluting the module namespace

* remove useless reallocation and fix bug with different in/out types, better tests

* avoid action if iszero(r) and/or iszero(c), move sparse vector shifting helpers to sparsevector.jl

* Make shift amounts deterministic in tests, move sparse vector tests into sparsevector.jl

* comment fix

* for some reason, copy!(a::SparseVector, b::SparseVector) does not work
staticfloat pushed a commit that referenced this pull request Jan 4, 2019
* implement circshift! for SparseMatrixCSC

* factor helper function shifter!, implement efficient circshift! for SparseVector

* add some @inbounds for improved performance

* remove allocations completely, giving a large improvement for small matrices

* some renaming to avoid polluting the module namespace

* remove useless reallocation and fix bug with different in/out types, better tests

* avoid action if iszero(r) and/or iszero(c), move sparse vector shifting helpers to sparsevector.jl

* Make shift amounts deterministic in tests, move sparse vector tests into sparsevector.jl

* comment fix

* for some reason, copy!(a::SparseVector, b::SparseVector) does not work
@KristofferC KristofferC mentioned this pull request Jan 11, 2019
53 tasks
KristofferC pushed a commit that referenced this pull request Jan 11, 2019
* implement circshift! for SparseMatrixCSC

* factor helper function shifter!, implement efficient circshift! for SparseVector

* add some @inbounds for improved performance

* remove allocations completely, giving a large improvement for small matrices

* some renaming to avoid polluting the module namespace

* remove useless reallocation and fix bug with different in/out types, better tests

* avoid action if iszero(r) and/or iszero(c), move sparse vector shifting helpers to sparsevector.jl

* Make shift amounts deterministic in tests, move sparse vector tests into sparsevector.jl

* comment fix

* for some reason, copy!(a::SparseVector, b::SparseVector) does not work

(cherry picked from commit 94993e9)
@StefanKarpinski StefanKarpinski added triage This should be discussed on a triage call backport 1.0 and removed triage This should be discussed on a triage call labels Jan 31, 2019
@JeffBezanson JeffBezanson removed backport 1.0 triage This should be discussed on a triage call labels Jan 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants