Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use same code path for count(pred, itr) as sum(pred, itr) #20663

Closed
dpsanders opened this issue Feb 18, 2017 · 5 comments
Closed

Use same code path for count(pred, itr) as sum(pred, itr) #20663

dpsanders opened this issue Feb 18, 2017 · 5 comments
Labels
collections Data structures holding multiple items, e.g. sets maths Mathematical functions performance Must go faster

Comments

@dpsanders
Copy link
Contributor

dpsanders commented Feb 18, 2017

count(pred, itr) (currently) has the same functionality as sum(pred, itr), only slower:

julia> function bench(N)
       r = randn(N)
       @time count(i->i>1.96, r)
       @time sum(i->i>1.96, r)
       end
bench (generic function with 1 method)

After warmup on latest master:

julia> bench(10^8)
 0.108136 seconds
  0.075002 seconds

We could just define count(pred, itr) = sum(pred, itr) but the count function just seems to be redundant (EDITED according to @ararslan's comment below and #20404 ).

This came up in
https://discourse.julialang.org/t/compute-number-of-values-in-an-array-above-a-threshold/2167

@tkelman
Copy link
Contributor

tkelman commented Feb 18, 2017

see #20404

@ararslan
Copy link
Member

ararslan commented Feb 18, 2017

I very much disagree with this. To reiterate what I said in #20404 (comment), should we make Bool not a subtype of Number (see #19168), it makes more sense to count non-numbers than to sum them, even if we do still define arithmetic on Bools. The functionality of sum and count overlaps in some cases, but I think in general they serve different purposes.

@ararslan ararslan added the deprecation This change introduces or involves a deprecation label Feb 18, 2017
@dpsanders
Copy link
Contributor Author

dpsanders commented Feb 18, 2017

Yes, I see what you mean; I hadn't seen #20404 before I posted.

In any case, count should not be 30% slower than sum in my example.

@dpsanders dpsanders changed the title Deprecate count(pred, itr) Use same code path for count(pred, itr) as sum(pred, itr) Feb 18, 2017
@ararslan ararslan added collections Data structures holding multiple items, e.g. sets maths Mathematical functions and removed deprecation This change introduces or involves a deprecation labels Feb 19, 2017
@JeffBezanson JeffBezanson added the performance Must go faster label Feb 20, 2017
@dpsanders
Copy link
Contributor Author

dpsanders commented May 27, 2017

On current master, count is nearly twice as slow on my machine:

julia> bench(10^8)
  0.104584 seconds
  0.059525 seconds

julia> 0.104584 / 0.059525
1.7569760604787903

@dpsanders
Copy link
Contributor Author

This seems to be fixed on latest master:

julia> r = randn(10^8);

julia> @btime count(i->i>1.96, $r)
  57.402 ms (0 allocations: 0 bytes)
2499971

julia> @btime sum(i->i>1.96, $r)
  56.915 ms (0 allocations: 0 bytes)
2499971

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets maths Mathematical functions performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants