Fix maximum(abs/abs2, T[]) #28535

alyst · 2018-08-09T06:04:06Z

At the moment

julia> maximum(Float64[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] reduce_empty(::Function, ::Type) at ./reduce.jl:224
 [2] mapreduce_empty(::typeof(identity), ::Function, ::Type) at ./reduce.jl:249
 [3] _mapreduce(::typeof(identity), ::typeof(max), ::IndexLinear, ::Array{Float64,1}) at ./reduce.jl:303
 [4] _mapreduce_dim at ./reducedim.jl:305 [inlined]
 [5] #mapreduce#542 at ./reducedim.jl:301 [inlined]
 [6] mapreduce at ./reducedim.jl:301 [inlined]
 [7] _maximum at ./reducedim.jl:650 [inlined]
 [8] _maximum at ./reducedim.jl:649 [inlined]
 [9] #maximum#548 at ./reducedim.jl:645 [inlined]
 [10] maximum(::Array{Float64,1}) at ./reducedim.jl:645
 [11] top-level scope at none:0

julia> maximum(abs, Float64[])
0.0

julia> minimum(abs, Float64[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] reduce_empty(::Function, ::Type) at ./reduce.jl:224
 [2] mapreduce_empty(::typeof(abs), ::Function, ::Type) at ./reduce.jl:250
 [3] _mapreduce(::typeof(abs), ::typeof(min), ::IndexLinear, ::Array{Float64,1}) at ./reduce.jl:303
 [4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Float64,1}, ::Colon) at ./reducedim.jl:305
 [5] #mapreduce#542 at ./reducedim.jl:301 [inlined]
 [6] mapreduce at ./reducedim.jl:301 [inlined]
 [7] _minimum at ./reducedim.jl:650 [inlined]
 [8] #minimum#551 at ./reducedim.jl:646 [inlined]
 [9] minimum(::Function, ::Array{Float64,1}) at ./reducedim.jl:646
 [10] top-level scope at none:0

It doesn't seem right to me. I suppose maximum() over an empty collection should throw in all cases.

The second commit improves

julia> sum(Bool[])
0

julia> sum(isequal(2), Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] mapreduce_empty(::Function, ::Function, ::Type) at ./reduce.jl:248
 [2] _mapreduce(::Base.Fix2{typeof(isequal),Int64}, ::typeof(Base.add_sum), ::IndexLinear, ::Array{Any,1}) at ./reduce.jl:303
 [3] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Any,1}, ::Colon) at ./reducedim.jl:305
 [4] #mapreduce#542 at ./reducedim.jl:301 [inlined]
 [5] mapreduce at ./reducedim.jl:301 [inlined]
 [6] _sum at ./reducedim.jl:650 [inlined]
 [7] #sum#545 at ./reducedim.jl:646 [inlined]
 [8] sum(::Function, ::Array{Any,1}) at ./reducedim.jl:646
 [9] top-level scope at none:0

It seems valid to return 0 in the second case. Of course, isequal() is just one specific case of bool-valued f, OTOH it's quite a common one.

StefanKarpinski · 2018-08-09T11:00:47Z

I agree that this is inconsistent. We can't break code anymore so we can't break maximum(abs, Float64[]), however, now that it works (post-1.0) and all that. We could go in the opposite direction and make maximum(Float64[]) return Inf. If so, we should really audit all base types for which this makes sense to do and implement all of them. We could also consider going through all the single-argument numerical functions in Base and seeing if we could specialize maximum and minimum for them for certain types. Doing this in a non-piecemeal fashion makes this a fairly large undertaking.

alyst · 2018-08-09T12:06:07Z

I'd rather leave maximum/minimum([f,] v = T[]) throwing. It looks consistent to me, also I guess quite a lot of code depends on that behaviour. -/+Inf might be inconvenient, given that non-empty v may contain Infs, NaNs or missing values; so in many use-cases an explicit isempty(v) check would be required (but many users would tend to forget about it). Another alternative is to return nothing, but it would require fixing a slew of type stability issues.

maximum(f, T[]) == zero(T) looks more like a bug. My guess is it was there because it was used by maximum(abs/abs2, ::AbstractSparseVector{T}) (notably defined by the same @eval as sum(abs/abs2, ::AbstractSparseVector{T})).

P.S. Sorry for abusing CI, I had no julia dev environment set up and I was too optimistic of my coding skills.

alyst · 2018-08-09T20:14:14Z

It looks like it's possible to implement sum/maximum/minimum(f, v::AbstractSparseVector) for arbitrary f.

jebej · 2018-08-15T15:10:23Z

I agree that this is inconsistent. We can't break code anymore so we can't break maximum(abs, Float64[])

maximum(abs, Float64[]) = 0 seems like a bug though, no one should rely on this.

alyst · 2018-08-23T12:54:42Z

I can remove the support for sum(isequal(::Any), T[]) from this PR if it's the only bottleneck for making a decision regarding its fate.

fix maximum(abs/abs2, ::AbstractSparseVector)

alyst · 2018-10-30T17:26:54Z

I've excluded the support for sum(isequal, ...) and rebased.
If not for 1.0.x, could it be considered for 1.1? It's really awkward that maximum(abs, []) == 0 bug couldn't be resolved for so long.

cc @mbauman (for AbstractSparseVector changes)

mbauman · 2018-10-30T18:27:24Z

Could you split apart the changes here? It looks like there is a performance fix (2nd commit) mixed in with a breaking change. It'll be easier to review & merge if we detangle the two.

alyst · 2018-10-30T18:57:09Z

@mbauman The 2nd commit depends on the first one. So I removed it from this PR, but, to reduce the noise, I'd rather submit the new PR if/when this one gets merged.

mbauman · 2018-10-30T20:48:28Z

That change was largely back-portable, though, no? You'd have to just preserve the weird maximum of empty case.

This change needs to pass PkgEval for it to be included in 1.1. Given that support was explicitly added for this case, I find it hard to blithely call it a bug.

alyst · 2018-10-30T23:29:37Z

That change was largely back-portable, though, no? You'd have to just preserve the weird maximum of empty case.

I can do that, though it looks more like a new feature. Then it would have to be merged first to avoid conflicts resolution.

Given that support was explicitly added for this case, I find it hard to blithely call it a bug.

Interesting question. :) This behaviour dates back to #7038 when maxabs() was introduced. Unfortunately, this PR doesn't contain justification for this logic nor maxabs() calls that rely on it.

simonbyrne · 2018-11-02T17:19:22Z

Sorry, late to this issue. What's wrong with maximum(abs, Float64[]) == 0? How is that any different than defining sum(abs, Float64[]) == 0?

The underlying rationale is that

reduce(f, op, X) == op(reduce(f, op, X1), reduce(f, op, X2))

where (X1,X2) is a partition of X (or approximate equality in the case of floats). If X1 or X2 are empty, then there may not be any consistent way to do that, so we throw an error. But in the cases where we can make a consistent definition, we might as well define it, hence the the whole point of reduce_empty and mapreduce_empty.

FWIW, I think we could also define

maximum(Float64[]) == -Inf
minimum(Float64[]) == +Inf
minimum(abs, Float64[]) == +Inf

StefanKarpinski · 2018-11-02T18:38:10Z

Agree, I don't see why we wouldn't give these kinds of definitions when there is a value that's correct. It's conventional across mathematics that an empty sum is zero, an empty product is one, an empty maximum or reals is negative infinity and an empty minimum of reals is positive infinity. It seems just as reasonable to say that an empty maximum of absolute values is zero.

alyst · 2018-11-02T18:52:49Z

an empty maximum or reals is negative infinity and an empty minimum of reals is positive infinity

Should it generalize to typemax/typemin(FT) in the generic case, FT being the return type of f (including Bool and Int)?

alyst · 2018-11-02T19:23:34Z

What's wrong with maximum(abs, Float64[]) == 0?

For one, it's not consistent with minimum(abs, Float64[]). The other problem is -- how to generalize it to any f? For consistency it has to be inf/sup_x f(x), but I'm not sure it's desirable/feasible. typemin/typemax(FT) is more generic, but also much more breaking than what this PR does.

simonbyrne · 2018-11-02T20:48:04Z

For one, it's not consistent with minimum(abs, Float64[]).

I agree, that's why I suggested adding that as well.

The other problem is -- how to generalize it to any f? For consistency it has to be inf/sup_x f(x), but I'm not sure it's desirable/feasible. typemin/typemax(FT) is more generic, but also much more breaking than what this PR does.

the criteria, roughly put, is "return a value when well-defined, throw an error when not" (e.g. for maximum(BigInt[]) or maximum(Any[]).

StefanKarpinski · 2018-11-02T21:34:11Z

We could also just consistently make maximum([f,] T[]) and minimum([f,] T[]) return typemin(T) and typemax(T) respectively. That's certainly simpler and it's not correct in the sense that it's valid answer such that max(maximum([f,] T[]), x::T) == x and min(minimum([f,] T[]), x::T) == x.

StefanKarpinski · 2018-11-02T21:36:25Z

Of course it does seem somewhat more informative to give a more precise answer...

nalimilan · 2018-11-02T21:45:25Z

Returning typemax when it's not Inf would be really weird. I see the reasoning and consistency behind it, but in practice it's more likely to confuse people than being actually useful IMHO.

alyst · 2018-11-02T21:47:15Z

the criteria, roughly put, is "return a value when well-defined, throw an error when not" (e.g. for maximum(BigInt[]) or maximum(Any[]).

That would make the behaviour of maximum(f, []) pretty much undefined. What could be written in the docs? Users would be confused: why maximum(abs, []) returns 0, but maximum(x -> abs(x) + 1, []) throws. The code that accepts generic f and calls maximum(f, x) would be more prone to errors. Also, in many application the input values will have some constraints, so returning min/max for the whole (-Inf;+Inf) wouldn't be so useful/precise.

simonbyrne · 2018-11-02T21:57:44Z

Returning typemax when it's not Inf would be really weird. I see the reasoning and consistency behind it, but in practice it's more likely to confuse people than being actually useful IMHO.

Yeah, I'm slightly hesitant about this as well, since it breaks my above criterion if X1 and X2 are arrays of different types, i.e.

max(maximum(UInt[]), maximum(Int[-1])) == max(UInt(0), Int(-1)) == 0

whereas Inf is a maximum across all types (well, all types in Base).

StefanKarpinski · 2018-11-02T22:19:33Z

Right, I didn't really mean to suggest that for types that don't have ±Inf values (that is what I wrote though). I really meant more to suggest that it would be correct if not precise for maximum(abs, Float64[]), for example. Then I guess the general behavior would be more like this:

maximum([f,] T[]) => convert(T, -Inf)
minimum([f,] T[]) => convert(T, +Inf)

Of course, we don't have to implement it that way since an inexact error would be a weird way for an empty minimum or maximum operation to present itself, but you get the idea.

simonbyrne · 2018-11-02T22:36:27Z

Only if f is monotonic, which is why we special-case identity, abs, abs2, etc. I guess we could add in log, exp and a few others as well.

We could also have, e.g. maximum(sin, T[]) = -one(T).

simonbyrne · 2018-11-02T23:23:34Z

The logic for reduce(op, T[]):

does op have an identity v of type T, i.e. is there a v::T such that op(v, x) == x for all valid x (not necessarily x::T).

For mapreduce(f, op, T[]):

does op have an identity v of the return type of f(::T), for all values from f. i.e. is there a v::typeof(f(::T)) such that op(v, f(x)) == f(x) for all valid x.

I realise that is still not exactly mathematically precise, but that's the best I can think of for now. Note that this still doesn't uniquely characterise all such values, e.g. any v <= 0 is an identity for mapreduce(abs, max, X), so we need some criteria to choose the "best" such identity.

StefanKarpinski · 2018-11-02T23:35:45Z

Only if f is montononic

If f produces Float64 then -Inf is a safe maximum. Of course, if we're already specializing on knowing that some function produces Float64 then why not specialize more?

simonbyrne · 2018-11-02T23:40:59Z

Can we specialize on f producing a Float64? If so, that does make sense as a fallback.

I would argue for keeping the special cases where possible: it is useful that maximum(abs, X) is guaranteed to be non-negative.

StefanKarpinski · 2018-11-02T23:48:31Z

Can we specialize on f producing a Float64?

I mean that we know that abs and abs2 and sin and cos and so on are type-preserving for floating-point types.

simonbyrne · 2018-11-02T23:57:40Z

Sure, but for those cases we can give "better" values, so we should do that. The more interesting question is whether we could do it for an arbitrary anonymous function? I guess we could do something similar to collect?

StefanKarpinski · 2018-11-03T00:18:38Z

I don't think we can in general since we can't know that the function always produces a given type.

alyst · 2018-11-06T10:05:52Z

What's wrong with maximum(abs, Float64[]) == 0? How is that any different than defining sum(abs, Float64[]) == 0?

sum(f, []) == 0 doesn't depend on f (only on its return type). maximum/minimum() do depend on f definition.

For mapreduce(f, op, T[]):

does op have an identity v of the return type of f(::T), for all values from f. i.e. is there a v::typeof(f(::T)) such that op(v, f(x)) == f(x) for all valid x.

So for op == sum v = 0 satisfies the above condition for any f. For op == maximum it's v = typemin(typeof(f)).

Also, for maximum(abs, Float64[]) == 0 we have a situation, which looks weird to me:

max(maximum(abs, Float64[]), maximum([-1.0])) == max(0.0, -1.0) == 0.0

While I agree that typemin/typemax approach may give unintuitive results, for the original counter-example

max(maximum(UInt[]), maximum(Int[-1])) == max(UInt(0), Int(-1)) == 0

on v1.0.1 I get

julia> max(maximum([-1]), maximum(abs, UInt[]))
ERROR: InexactError: check_top_bit(Int64, -1)
Stacktrace:
 [1] check_top_bit at ./boot.jl:581 [inlined]
 [2] toUInt64 at ./boot.jl:692 [inlined]
 [3] Type at ./boot.jl:722 [inlined]
 [4] convert at ./number.jl:7 [inlined]
 [5] _promote at ./promotion.jl:261 [inlined]
 [6] promote at ./promotion.jl:284 [inlined]
 [7] max(::Int64, ::UInt64) at ./promotion.jl:363
 [8] top-level scope at none:0

StefanKarpinski · 2018-11-06T16:48:11Z

Let me approach this from the other direction? Are there situations where it's really useful that maximum(abs, Float64[]) give 0.0 instead of -Inf?

As it stands, I'm concerned that the current approach will potentially require special-casing every Float64 to Flaot64 function in Base Julia and standard libraries and beyond. However, we're already kind of in that situation since maximum(f, Float64[]) can't work correctly for general f since we can't know if f has return type Float64 in general. But putting each function in a list of "functions that are known to be T --> T" is an easier, less fiddly problem than recording the minimum and maximum values in the domain of each function. Or, if we want to do that then we should do it in a more generic way like defining maximum(f, T).

simonbyrne · 2018-11-06T17:14:00Z

Also, for maximum(abs, Float64[]) == 0 we have a situation, which looks weird to me:
max(maximum(abs, Float64[]), maximum([-1.0])) == max(0.0, -1.0) == 0.0

But my criterion wouldn't apply to that example. It would however apply to:

max(maximum(abs, Float64[]), maximum(abs, [-1.0])) == max(0.0, 0.0) == 0.0

which IMHO makes perfect sense.

on v1.0.1 I get

Yes, it was meant to be a hypothetical if we were to change it to use typemax/typemin (which I'm not advocating).

simonbyrne · 2018-11-06T17:18:02Z

Let me approach this from the other direction? Are there situations where it's really useful that maximum(abs, Float64[]) give 0.0 instead of -Inf?

It allows you to write sqrt(maximum(abs, X)) or log(maximum(abs, X)) without getting a DomainError. Though I don't know if there is any code in the wild which does this.

Or, if we want to do that then we should do it in a more generic way like defining maximum(f, T).

I like that!

alyst · 2018-11-06T17:41:08Z

But my criterion wouldn't apply to that example.

Sure. But isn't max(maximum(f1, x1), maximum(f2, x2)) supposed to give the same result as maximum([f1.(x1); f2.(x2)])?

JeffBezanson · 2018-11-06T17:51:11Z

But putting each function in a list of "functions that are known to be T --> T" is an easier, less fiddly problem than recording the minimum and maximum values in the domain of each function.

That's a good point --- maximum(f, A) could be implemented as maximum(map(f, A)), which would give -Inf for maximum(abs, Float64[]). However, avoiding that might be the reason a maxabs function existed in the first place.

I think the first thing we should do is add methods to reduce_empty and mapreduce_empty for min and max, calling typemax and typemin. Possibly only for floating point types, but I don't see the harm in allowing it for all Real types. We can then later decide to refactor that into maximum(f, T) if we want.

simonbyrne · 2018-11-06T17:54:18Z

I think the first thing we should do is add methods to reduce_empty and mapreduce_empty for min and max, calling typemax and typemin. Possibly only for floating point types, but I don't see the harm in allowing it for all Real types. We can then later decide to refactor that into maximum(f, T) if we want.

See #29919.

vtjnash · 2021-04-09T22:49:56Z

Seems like this has been superseded by #29919

From discussion in #28535

alyst force-pushed the fix_mapreduce branch 2 times, most recently from e8f8404 to e1506d1 Compare August 9, 2018 10:05

alyst force-pushed the fix_mapreduce branch 3 times, most recently from 6290354 to fd51bef Compare August 9, 2018 16:25

maximum(abs, T[]) should throw an error

abc56ac

fix maximum(abs/abs2, ::AbstractSparseVector)

alyst force-pushed the fix_mapreduce branch from fd51bef to 9afa65a Compare October 30, 2018 13:22

alyst force-pushed the fix_mapreduce branch from 9afa65a to abc56ac Compare October 30, 2018 18:49

mbauman added the needs pkgeval Tests for all registered packages should be run with this change label Oct 30, 2018

mbauman changed the title ~~Fix maximum(abs/abs2, T[]), allow sum(isequal(::Any), T[])~~ Fix maximum(abs/abs2, T[]) Oct 30, 2018

alyst mentioned this pull request Nov 1, 2018

Allow any f for sum/minimum/maximum(f, v::AbstractSparseVector) #29884

Merged

simonbyrne mentioned this pull request Nov 3, 2018

add more methods and tests for reductions over empty arrays #29919

Merged

simonbyrne mentioned this pull request Nov 9, 2018

negative p-norm of an empty collection #29985

Open

simonbyrne mentioned this pull request Apr 26, 2019

mapreduce(abs, max, Int[]) returns 0 #28028

Closed

vtjnash closed this Apr 9, 2021

vtjnash pushed a commit that referenced this pull request Nov 2, 2023

add more defaults for reductions over empty arrays (#29919)

09617ac

From discussion in #28535

Fix maximum(abs/abs2, T[]) #28535

Fix maximum(abs/abs2, T[]) #28535

Conversation

alyst commented Aug 9, 2018

StefanKarpinski commented Aug 9, 2018

alyst commented Aug 9, 2018 • edited Loading

alyst commented Aug 9, 2018

jebej commented Aug 15, 2018

alyst commented Aug 23, 2018

alyst commented Oct 30, 2018

mbauman commented Oct 30, 2018

alyst commented Oct 30, 2018

mbauman commented Oct 30, 2018

alyst commented Oct 30, 2018

simonbyrne commented Nov 2, 2018 • edited Loading

StefanKarpinski commented Nov 2, 2018 • edited Loading

alyst commented Nov 2, 2018

alyst commented Nov 2, 2018 • edited Loading

simonbyrne commented Nov 2, 2018 • edited Loading

StefanKarpinski commented Nov 2, 2018

StefanKarpinski commented Nov 2, 2018

nalimilan commented Nov 2, 2018

alyst commented Nov 2, 2018

simonbyrne commented Nov 2, 2018

StefanKarpinski commented Nov 2, 2018 • edited Loading

simonbyrne commented Nov 2, 2018 • edited Loading

simonbyrne commented Nov 2, 2018

StefanKarpinski commented Nov 2, 2018

simonbyrne commented Nov 2, 2018

StefanKarpinski commented Nov 2, 2018

simonbyrne commented Nov 2, 2018

StefanKarpinski commented Nov 3, 2018

alyst commented Nov 6, 2018

StefanKarpinski commented Nov 6, 2018 • edited Loading

simonbyrne commented Nov 6, 2018 • edited Loading

simonbyrne commented Nov 6, 2018

alyst commented Nov 6, 2018

JeffBezanson commented Nov 6, 2018

simonbyrne commented Nov 6, 2018

vtjnash commented Apr 9, 2021

alyst commented Aug 9, 2018 •

edited

Loading

simonbyrne commented Nov 2, 2018 •

edited

Loading

StefanKarpinski commented Nov 2, 2018 •

edited

Loading

alyst commented Nov 2, 2018 •

edited

Loading

simonbyrne commented Nov 2, 2018 •

edited

Loading

StefanKarpinski commented Nov 2, 2018 •

edited

Loading

simonbyrne commented Nov 2, 2018 •

edited

Loading

StefanKarpinski commented Nov 6, 2018 •

edited

Loading

simonbyrne commented Nov 6, 2018 •

edited

Loading