count(f, x) should not be equivalent to sum(f, x) #20404

stevengj · 2017-02-02T18:40:10Z

I took a look at the count implementation for #20403 (cc @cossio), and was surprised to discover that the current definition is nearly identical to sum (albeit with a type instability):

function count(pred, itr)
    n = 0
    for x in itr
        n += pred(x)
    end
    return n
end

For example, count(sqrt, [1,2,3,5]) returns 6.382332347441762.

We could disallow this case entirely, but one possibility would be to make count(f, x) equivalent to countnz for non-boolean data. Indeed, I'm not sure why we have both countnz and count — couldn't we merge the two functions into a single count function?

The text was updated successfully, but these errors were encountered:

stevengj · 2017-02-02T18:43:59Z

i.e. define

iszero(b::Bool) = !b # maybe more efficient than the fallback b == zero(b) definition
function count{F}(pred::F, itr)
    n = 0
    for x in itr
        n += !iszero(pred(x))
    end
    return n
end
count(itr) = count(identity, itr)
@deprecate countnz(itr) count(itr)

nalimilan · 2017-02-02T18:51:53Z

+1, that would be more consistent with find.

…elim countnz in favor of count(itr) (fixes #20403)

ararslan · 2017-02-02T19:00:25Z

I think I'd prefer something like

function count(f, itr)
    n = 0
    for x in itr
        if x
            n += 1
        end
    end
    return n
end
count(itr) = count(identity, itr)
@deprecate countnz(itr) count(!iszero, itr)

This has the benefit of throwing an error for non-boolean data when using count directly.

stevengj · 2017-02-02T19:03:50Z

@ararslan, that might actually be slower because it forces a branch. One could do n += f(x)::Bool, I suppose.

However, I'd prefer to have one function that does more rather than restrict the functionality to Bool here… I'm not sure that throwing an error for non-boolean data is a feature. And, as @nalimilan says, this behavior is consistent with the find functions.

…elim countnz in favor of count(itr) (fixes #20403)

ararslan · 2017-02-02T19:10:43Z

I'm not sure that throwing an error for non-boolean data is a feature

I disagree because without a predicate for non-boolean data, the name count doesn't mean much. What is it counting? I have similar issues with find, and IIRC there's a proposal somewhere to improve the consistency and clarity of find and related functions.

Having f(x)::Bool seems fine to me, especially if a branch is costly, though I'd be curious to see whether the branch makes an appreciable difference in benchmarks.

stevengj · 2017-02-02T19:42:51Z

@ararslan, your version with the branch (corrected to call if f(x)) is more than 2x slower than Base.count for @btime count(iseven, 1:10^6) evals=1.

ararslan · 2017-02-02T19:44:15Z

Dang. f(x)::Bool it is, then, in my proposal.

nalimilan · 2017-02-02T20:26:38Z

I disagree because without a predicate for non-boolean data, the name count doesn't mean much. What is it counting? I have similar issues with find, and IIRC there's a proposal somewhere to improve the consistency and clarity of find and related functions.

That's https://github.com/JuliaLang/Juleps/blob/master/Find.md, but it doesn't really address the question of whether find should return non-zero entries or true entries. I agree the former behavior can be surprising at first, but it's also more general and potentially useful, so why not keep it?

ararslan · 2017-02-02T20:33:09Z

Because it's unclear. I think being more explicit and saying count(!iszero, x) or find(!iszero, x) makes the intention completely obvious. IMO the one-argument versions should stick to boolean data, where there's no room for confusion, and the two-argument version gives you the ability to supply any kind of data with an appropriate predicate. I think we should be striving for clarity here rather than terseness.

nalimilan · 2017-02-02T20:36:38Z

Now that we can do !iszero, I'd tend to agree with you.

stevengj · 2017-02-02T21:38:07Z

Probably should continue to have a countnz function if the alternative is to require everyone to do count(!iszero, x). e.g. we want specialized versions for sparse arrays, and I don't know if that is possible with !iszero, because the ! will generate a specialized anonymous function for each call and hence defining a specialized count(::typeof(!iszero), x) method won't work. Of course, we could define a specialized !(::typeof(iszero)) = notiszero too, with notiszero(x) = !iszero(x), I guess, but that seems to be getting messy (and is harder for external packages to overload).

ararslan · 2017-02-02T21:48:24Z

For sparse there's nnz, which should arguably be renamed, though that's somewhat tangential. I did a bit of playing around and it seems that you actually can specialize on typeof(!iszero), which is super cool. So count(::typeof(!iszero), x) is a valid overloadable method.

tkelman · 2017-02-02T21:57:56Z

There's an important difference between nnz and countnz though, the former is a structural property whereas the latter tests for and does not include stored zeros. I can't remember if there's a good reason countnz and count are separate though.

…f non-boolean values are encountered (fixes JuliaLang#20404)

bramtayl · 2017-02-03T00:31:10Z

I'd imagine most uses of count would be for the boolean case, in which case sum works just fine. If someone wants the number of non-zero values, why not just sum a generator: sum(!iszero(x) for x in X)?

nalimilan · 2017-02-03T09:42:17Z

@ararslan Why do you think having count even if it does the same thing as sum but only for boolean is a good idea? For safety?

cossio · 2017-02-03T15:15:14Z

@nalimilan Good point. I think count only makes sense if it does something different. (And the difference cannot be that count throws an error with non-booleans while sum doesn't).

ararslan · 2017-02-03T18:43:33Z

@nalimilan I think it's a good idea because it provides both clarity and safety in terms of knowing exactly what you're getting. I think counting nonzeros is a rather odd and surprising default. I assume the reason that we don't currently have a one-argument count method is that it's equivalent to sum in that case, but should we make Bool not a subtype of Number (see #19168), it makes more sense to count non-numbers than to sum them, even if we do still define arithmetic on Bools.

fp4code · 2017-02-19T16:01:20Z

To solve the speed point, just add @simd in front of the loop:

function fast_count(pred, itr)
    n = 0
    @simd for x in itr
        n += pred(x)::Bool
    end
    return n
end

a = randn(1000000)
@benchmark fast_count(x->x>1.96, a) # median time:      535.944 μs (0.00% GC)
@benchmark sum(x->x>1.96, a)        # median time:      565.832 μs (0.00% GC)
@benchmark count(x->x>1.96, a)      # median time:        1.370 ms (0.00% GC)

nalimilan · 2017-02-19T17:53:50Z

I wouldn't have expected @simd to make a difference here. I thought it mainly allowed floating point optimizations?

KristofferC · 2017-02-19T17:57:01Z

There are many SIMD instructions for integers.

nalimilan · 2017-02-19T18:06:59Z

Yes, but I would expect them to be enabled by default. Do they change the behavior of the code?

KristofferC · 2017-02-19T18:59:13Z

Perhaps some aliasing checks are turned off. I'm not sure, I don't get SIMD without the macro at least.

stevengj added the collections Data structures holding multiple items, e.g. sets label Feb 2, 2017

stevengj added a commit that referenced this issue Feb 2, 2017

make count(f,itr) count the number of nonzero values (fixes #20404), …

05280cf

…elim countnz in favor of count(itr) (fixes #20403)

stevengj mentioned this issue Feb 2, 2017

make count(f,itr) count the number of nonzero; countnz(itr) -> count(itr) #20405

Closed

stevengj added a commit that referenced this issue Feb 2, 2017

make count(f,itr) count the number of nonzero values (fixes #20404), …

b51ecbd

…elim countnz in favor of count(itr) (fixes #20403)

cossio mentioned this issue Feb 2, 2017

count(itr) to count trues #20403

Closed

stevengj added a commit to stevengj/julia that referenced this issue Feb 3, 2017

add count(itr) (fixes JuliaLang#20403) and throw and error in count i…

6d7bc98

…f non-boolean values are encountered (fixes JuliaLang#20404)

stevengj mentioned this issue Feb 3, 2017

add count(itr) and throw and error in count if non-boolean values are encountered #20421

Merged

ararslan closed this as completed in ec832e5 Feb 3, 2017

tkelman mentioned this issue Feb 18, 2017

Use same code path for count(pred, itr) as sum(pred, itr) #20663

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

count(f, x) should not be equivalent to sum(f, x) #20404

count(f, x) should not be equivalent to sum(f, x) #20404

stevengj commented Feb 2, 2017 •

edited

Loading

stevengj commented Feb 2, 2017 •

edited

Loading

nalimilan commented Feb 2, 2017

ararslan commented Feb 2, 2017

stevengj commented Feb 2, 2017 •

edited

Loading

ararslan commented Feb 2, 2017

stevengj commented Feb 2, 2017 •

edited

Loading

ararslan commented Feb 2, 2017

nalimilan commented Feb 2, 2017

ararslan commented Feb 2, 2017

nalimilan commented Feb 2, 2017

stevengj commented Feb 2, 2017

ararslan commented Feb 2, 2017

tkelman commented Feb 2, 2017 •

edited

Loading

bramtayl commented Feb 3, 2017

nalimilan commented Feb 3, 2017

cossio commented Feb 3, 2017 •

edited

Loading

ararslan commented Feb 3, 2017

fp4code commented Feb 19, 2017 •

edited

Loading

nalimilan commented Feb 19, 2017

KristofferC commented Feb 19, 2017

nalimilan commented Feb 19, 2017

KristofferC commented Feb 19, 2017 •

edited

Loading

count(f, x) should not be equivalent to sum(f, x) #20404

count(f, x) should not be equivalent to sum(f, x) #20404

Comments

stevengj commented Feb 2, 2017 • edited Loading

stevengj commented Feb 2, 2017 • edited Loading

nalimilan commented Feb 2, 2017

ararslan commented Feb 2, 2017

stevengj commented Feb 2, 2017 • edited Loading

ararslan commented Feb 2, 2017

stevengj commented Feb 2, 2017 • edited Loading

ararslan commented Feb 2, 2017

nalimilan commented Feb 2, 2017

ararslan commented Feb 2, 2017

nalimilan commented Feb 2, 2017

stevengj commented Feb 2, 2017

ararslan commented Feb 2, 2017

tkelman commented Feb 2, 2017 • edited Loading

bramtayl commented Feb 3, 2017

nalimilan commented Feb 3, 2017

cossio commented Feb 3, 2017 • edited Loading

ararslan commented Feb 3, 2017

fp4code commented Feb 19, 2017 • edited Loading

nalimilan commented Feb 19, 2017

KristofferC commented Feb 19, 2017

nalimilan commented Feb 19, 2017

KristofferC commented Feb 19, 2017 • edited Loading

stevengj commented Feb 2, 2017 •

edited

Loading

stevengj commented Feb 2, 2017 •

edited

Loading

stevengj commented Feb 2, 2017 •

edited

Loading

stevengj commented Feb 2, 2017 •

edited

Loading

tkelman commented Feb 2, 2017 •

edited

Loading

cossio commented Feb 3, 2017 •

edited

Loading

fp4code commented Feb 19, 2017 •

edited

Loading

KristofferC commented Feb 19, 2017 •

edited

Loading