Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sparse getindex regression #7162

Merged
merged 1 commit into from
Jun 10, 2014
Merged

Conversation

tanmaykm
Copy link
Member

@tanmaykm tanmaykm commented Jun 7, 2014

This PR aims to fix broken getindex for sparse matrices with index type other than Int and adds some related tests.
ref #7131, #7047

@ViralBShah
Copy link
Member

Is it possible to avoid having a new binarysearch implementation in the sparse matrix code? Cc @mauro3 @kmsquire

@ViralBShah
Copy link
Member

We should also update NEWS.md with the sparse indexing improvements.

@tanmaykm
Copy link
Member Author

tanmaykm commented Jun 7, 2014

Here are some results from a comparison between binarysearch and searchsorted: https://gist.github.com/tanmaykm/7a1e3d56ff7445f0aa03

We can probably move binarysearch into sort.jl as searchuniquesorted or something similar, though the difference is not that significant for this use case.

@mauro3
Copy link
Contributor

mauro3 commented Jun 7, 2014

Testing for equality after searchsortedfirst should be almost the same as binarysearch. However, I tested it with @tanmaykm script and it's slowest. No idea why. Have a look at my comment in his gist:
https://gist.github.com/tanmaykm/7a1e3d56ff7445f0aa03

I don't have time to investigate just now though.

@ViralBShah
Copy link
Member

Cc: @kmsquire

@ViralBShah
Copy link
Member

We should probably also test for small inputs. It is quite likely that for columns with less than 10 elements, a linear search may be faster.

@ViralBShah ViralBShah added this to the 0.3 milestone Jun 7, 2014
@mauro3
Copy link
Contributor

mauro3 commented Jun 7, 2014

@ViralBShah I did a comparison when I wrote the indexing and found no advantage of using linear search. But that was wrong as enumerate seems to kill performance, so here updated:
https://gist.github.com/mauro3/246432bd4f2943e3dfd5

Binary and linear search are about equally fast for a haystack of size 15

  • For n=1 to 5, linear search takes about 50% of binary search
  • For n=5 to 10, linear search takes about 75% of binary search

(But this is not what is found in http://schani.wordpress.com/2010/04/30/linear-vs-binary-search/ for C. Where linear search is better up to arrays length 200)

@ViralBShah
Copy link
Member

It would be nice to bake the linear search into the various searchsorted methods.

@ViralBShah
Copy link
Member

That blog post is good. I would be curious to see how close we can get, since a bunch of work on SIMD is going on now.

@mauro3
Copy link
Contributor

mauro3 commented Jun 8, 2014

Made PR with performance tests for sparse indexing: #7177.
It also contains tests for uint32 indices which need to be un-commented once this is fixed.

@tanmaykm
Copy link
Member Author

tanmaykm commented Jun 9, 2014

Updated and optimized the tests at https://gist.github.com/tanmaykm/7a1e3d56ff7445f0aa03 further. Now times reported by alll three methods (binarysearch, searchsorted, searchsortedfirst) are very close, though searchsorted beats the other two consistently by a narrow margin.

The bulk of the difference in test result was because of importing Base.Order instead of using Base.Order.Forward. Using Base.Order.Forward I think prevented some loop unrolling optimizations which code_llvm shows. I am not too knowledgeable about interpreting llvm bytecode and would appreciate if someone else also has a look at it.

I think we can use searchsorted instead of binarysearch. I shall update this PR.

@mauro3
Copy link
Contributor

mauro3 commented Jun 9, 2014

@tanmaykm good detective work but I'm still puzzled why searchsortefirst should be slower. Maybe something to ask on the mailing list.

There are also a searchsorted* functions for ranges, maybe the rangesort in sparsematrix.jl could be replaced as well?

@tanmaykm
Copy link
Member Author

tanmaykm commented Jun 9, 2014

Updated.

@mauro3 yes, the slower searchsortedfirst is puzzling. I didn't replace rangesearch as it performs better (https://gist.github.com/tanmaykm/b287d6a8c98f4be3dfe3).

@ViralBShah
Copy link
Member

Is this good to merge now?

ind = binarysearch(A.rowval, i0, A.colptr[i1], A.colptr[i1+1]-1)
ind > -1 ? A.nzval[ind] : zero(T)
r1 = int(A.colptr[i1])
r2 = A.colptr[i1+1]-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also be converted to int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r2 is an int because of the subtraction

@mauro3
Copy link
Contributor

mauro3 commented Jun 9, 2014

Thanks for the clarifications and sorry to bother with two more things!

  • returning 0 instead of -1 to indicate "not found" might be more natural?
  • I had another look at rangesearch yesterday and found that this is 20% faster (but give me a minute to double check):
function rangesearch(haystack::Range, needle)
    di = (needle - first(haystack))/ step(haystack)
    i = ifloor(di)
    (di-i==0 && 1<=i+1<=length(haystack)) ? i+1 : -1
end

@mauro3
Copy link
Contributor

mauro3 commented Jun 9, 2014

Sorry, no the second suggestion is wrong, ignore it.

@kmsquire
Copy link
Member

kmsquire commented Jun 9, 2014

On my machine, searchsorted actually beats the other two significantly:

# last run only
elapsed time: 0.06766474 seconds (0 bytes allocated)
elapsed time: 0.032284962 seconds (0 bytes allocated)
elapsed time: 0.070093474 seconds (0 bytes allocated)

I actually find that somewhat surprising as well. I'm also not well versed at code_llvm, although I try to read it from time to time.

@kmsquire
Copy link
Member

kmsquire commented Jun 9, 2014

@tanmaykm good detective work but I'm still puzzled why searchsortefirst should be slower.

One possible reason has nothing to do with the search function: the timing does include an extra 10^6 array lookups that aren't in the searchsorted call.

@mauro3
Copy link
Contributor

mauro3 commented Jun 9, 2014

More detective work:
https://gist.github.com/mauro3/2da5d92f1da8a56d40f0

This shows that for searchsortedfirst is faster than searchsorted when looping over a disordered array of "needles". In particular for smaller haystacks, which is probably what is encountered more often for sparse arrays. For instance with N length of haystack, NN number of needles to loop over:

N=1000, NN=100000
randperm(NN) input:
bin      elapsed time: 0.002756478 seconds (0 bytes allocated)
sort     elapsed time: 0.005649829 seconds (0 bytes allocated)
sortlast elapsed time: 0.003916286 seconds (0 bytes allocated)
[1:NN] input:
bin      elapsed time: 0.002726484 seconds (0 bytes allocated)
sort     elapsed time: 0.001809206 seconds (0 bytes allocated)
sortlast elapsed time: 0.00395091 seconds (0 bytes allocated)

This suggests that the difference is caused by caching(?). Trying slightly different tests, it didn't seen the extra array lookups make much of a difference.

also replaced binarysearch with methods from sort.jl
ref JuliaLang#7131, JuliaLang#7047
@tanmaykm
Copy link
Member Author

tanmaykm commented Jun 9, 2014

Caching does seem to be the likely explanation. Changed rangesearch to return 0 instead of -1 as suggested and rebased.

@mauro3
Copy link
Contributor

mauro3 commented Jun 9, 2014

I think this is good to merge.

tanmaykm added a commit that referenced this pull request Jun 10, 2014
fix sparse getindex regression
@tanmaykm tanmaykm merged commit 664cab5 into JuliaLang:master Jun 10, 2014
@tanmaykm
Copy link
Member Author

Thanks. Merged.

@ViralBShah
Copy link
Member

I now see:

exception on 5: ERROR: test failed: S[end] == S[1] + S[2,2]
 in error at error.jl:21
 in default_handler at test.jl:19
 in do_test at test.jl:39
 in anonymous at no file:296
 in runtests at /Users/viral/julia/test/testdefs.jl:5
 in anonymous at multi.jl:847
 in run_work_thunk at multi.jl:613
 in anonymous at task.jl:847
while loading sparse.jl, in expression starting on line 292
ERROR: test failed: S[end] == S[1] + S[2,2]
 in anonymous at task.jl:1350
while loading sparse.jl, in expression starting on line 292
while loading /Users/viral/julia/test/runtests.jl, in expression starting on line 46

@tanmaykm
Copy link
Member Author

Yes. Raised an issue #7197 for that.

@mauro3
Copy link
Contributor

mauro3 commented Jun 10, 2014

Work-around for #7197 and a minor bug fix for this PR in #7201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants