Introduce `UInt1` type to replace misuse of `Bool` #18367

TotalVerb · 2016-09-05T23:21:44Z

This will solve a whole bunch of problems.

UInt1 is a lot more intuitive than Bool as a identity element for promote_type.
true + true === 2 won't be inconsistent with every other unsigned type; we can have UInt1(1) + UInt1(1) === UInt1(0) as expected.
Bool can be made non-numeric. (cc @StefanKarpinski)

I think the current system is really not ideal; it's worse than a method pun: it's punning a type for two very different things.

The text was updated successfully, but these errors were encountered:

TotalVerb · 2016-09-05T23:40:02Z

If we decide to commit to UInt1 as an identity element for promote_type, then a lot of advantages fall out. For instance, sum([]) and prod([]) could return a usable result.

stevengj · 2016-09-06T00:42:43Z

sum([]) still can't return a usable result, because UInt1 is only an usable as an identity element for promotion of numeric types.

TotalVerb · 2016-09-06T03:52:41Z

sum([]) should return UInt1(0), and prod([]) should return UInt1(1). Because most Julia code is type-stable, it is easy to forget that some code cannot be made type-stable.

Here's an example: map(length, []) returns Any[], because methods of length can return anything. So prod(map(length, [])) currently fails. This is not the most efficient code (mapreduce works better) but demonstrates a case where the current situation of bailing out because there's no one(Any) is just not appropriate.

ararslan · 2016-09-06T04:40:42Z

Because most Julia code is type-stable, it is easy to forget that some code cannot be made type-stable.

Then in the interest of maintaining as much type stability as possible, wouldn't it make more sense to continue to error on sum([]) and prod([]) rather than returning something of a different type?

TotalVerb · 2016-09-06T04:43:09Z

sum(::Vector{Any}) is already type-unstable, and it couldn't be any way else. The only time where it fails is when the vector is empty, which is a very annoying inconsistency.

oxinabox · 2016-09-06T04:43:22Z

Am I understand this right?
This would be a large change to the behaviour of addition of Bools.

I suspect making true + true == false will break a lot of code.
For example
I often calculate the accuracy of a binary classifier by:

mean(output .== ground_truth)

However this is probably bad.
It doesn't work in most languages.

I am in-favor of breaking lots of code, to get the language right, prior to hitting v1.0

TotalVerb · 2016-09-06T04:47:22Z

@ararslan I cannot stress enough that Vector{Any} is an important case that should be dealt with properly, because this is commonly what is produced whenever code can't be made perfectly type-stable. As it stands, to find the sum of elements in a Vector{Any}, the code must be isempty(xs) ? 1 : sum(xs). This is not just a theoretical concern; it comes up in practice in extremely common areas.

TotalVerb · 2016-09-06T04:50:18Z

@oxinabox I propose that Bool be made non-numeric, and that UInt1 be introduced to model F₂. Note that the common notational meaning of + and * on Bool is | and & respectively; these already have symbols in Julia and so defining + and * on Bool is not necessary and can be removed.

To calculate the accuracy of a binary classifier, you can just convert the Bools to Float64 first: mean(Float64(x) for x in output .== ground_truth).

ararslan · 2016-09-06T04:52:24Z

For @oxinabox's specific case there, you can pass a function/type constructor as the first argument to mean (same goes for sum and prod), so you can do mean(Float64, output .== ground_truth).

andyferris · 2016-09-06T07:15:33Z

+1 for UInt1 - it sounds really useful to have Z2. What will a bit in a BitArray be, though? Will we need BoolArray?

I also kind-of like having true + true == 2 but I agree it is kind-of disgusting at the same time.

As it stands, to find the sum of elements in a Vector{Any}, the code must be isempty(xs) ? 1 : sum(xs)

I assume that is meant to be a zero, not a one. We could specialize sum(::AbstractVector{Any}) etc to do exactly this. Or even better, why not just define a method for zero(Any) and one(Any)? For instance, if zero(::Type{Any}) = 1 (or 1.0, maybe) then zero(Any)::Any so it's all very consistent.

(As an aside, in other contexts I've also been wanting a singleton One() and Zero(), which might be useful here? For a simple example. consider Float64(Zero) (meaning roughly convert(Float64, Zero())) instead of zero(Float64). We don't have pi(Float64) but we do have Float64(pi) - because it makes sense that way!).

stevengj · 2016-09-06T11:21:56Z

sum([]) should return UInt1(0)

@TotalVerb, the problem is that UInt1(0) is the additive identity only for numbers, whereas sum (by design) works for anything with a + operation (and ideally zero too). There is simply no satisfactory way to define sum for Any[], because there is no additive identity that works for any conceivable type defining +.

This is why zero(Any) is undefined.

andyferris · 2016-09-06T12:00:30Z

because there is no additive identity that works for any conceivable type defining +.

Maybe this plays nicer with my suggestion of Zero() such that MyType(::Zero) should be defined along with +. We can have Any(::Zero) = Zero() and sum([]) = Zero() and if you try to add that result to anything else that supports + then it would all work out (we could have +(x, ::Zero) = x, etc, already defined in Base). It won't always be type stable, but it might always be "logically stable".

stevengj · 2016-09-06T12:05:14Z

@andyferris, honestly, it seems like an opaque Zero() will confuse a typical user, it won't work with lots of methods (imagine a user doing sin(sum(A)) and passing A=[]), and it adds one more obscure method to the list of things a type needs to implement. Anyone who understands Zero() can surely use T[] instead of [].

TotalVerb · 2016-09-06T15:56:42Z

How often does one take the sum of non-numbers? Nobody is actually writing sum([]). This is about handling the empty case sanely, in 90% of cases at least, which is a strict improvement over an error, which handles 0% of cases.

ararslan · 2016-09-06T16:52:36Z

How often does one take the sum of non-numbers?

I would venture a guess that it's rather common amongst people coming to Julia from R. That said, I don't think sum(::Vector{Bool}), akin to Lyndon's example, is what should actually be used in Julia; that's why we have count.

I think another important question we should ask is: how often does one take the sum of Array{Any}? IMHO it might even make more sense to have all reduction operations fail on Array{Any}, not just the empty case, perhaps saying something like "Cannot reduce an array of type Any. Convert the elements of the array to a common type first." I think that may be annoying in some cases, but folks will end up with type-stable code as a result.

If UInt1 becomes a thing, I think we'll have to think carefully about deprecation so as not to suddenly and mysteriously break things that relied on these kinds of operations on Bools that we didn't foresee.

stevengj · 2016-09-06T17:22:29Z

@tkelman, it's not that uncommon to sum arrays of arrays of numbers, for example. And note that we throw an error for sum(Vector{Int}[]) too; it would be weird if loosening the array type made it "work" (but probably not give what you want).

Giving something that is definitely wrong 5% of the time is worse than giving an error 100% of the time, in my opinion.

@ararslan, Julia is a still a dynamic language. In many other contexts, we allow objects of type Any and determine method applicability at runtime. I don't see why sum of non-empty Any[...] arrays shouldn't be the same — if + works, it should work.

stevengj · 2016-09-06T17:23:53Z

Anyway, this discussion is a bit derailed. Whether a UInt1 type makes sense is independent of whether zero(::Type{Any}) = UInt1(0), and the empty case of sum can be (and has been) discussed elsewhere.

stevengj · 2016-09-06T17:24:55Z

See also e.g. #8092, #12461, #6994, #6069 for discussions of sum and the empty-array case.

TotalVerb · 2016-09-06T17:29:29Z

@stevengj It is precisely because all those are closed that I'm discussing this here. Furthermore, I'm reminded of

julia> map(() -> 2)
2

which is a generalization to the empty case that has a similar objection—it is an incorrect generalization in some cases; perhaps I would have expected [2] or [2, 2, 2]. But yet in this case, we do not throw an error, because returning the simplest possible result is sane in many situations.

TotalVerb · 2016-09-06T17:33:58Z

Also, I disagree that this discussion is irrelevant to UInt1. The existence of a zero or a one that can promote upward to any numeric type is a strong endorsement of returning those zeros or ones as a degenerate base case for certain operations. The current situation with Bool is unfortunate, and it is understandable why sum([]) should not return false, for a variety of reasons including user bewilderment.

TotalVerb · 2016-09-06T17:43:13Z

A naïve implementation of the Flatten iterator type demonstrates why a sane default is really not such a bad thing.

immutable Flatten
    xs
end
length(itr::Flatten) = sum(length, itr.xs)  # seems reasonable enough?

Then length(Flatten([])) breaks down.

stevengj · 2016-09-06T18:46:23Z

The map case is very different, because it is a question of dispatch on the argument signature, not on the values.

TotalVerb · 2016-09-06T18:51:56Z

I disagree that it is very different. It is just as conceivable to imagine map(f, xs...) as it is to imagine sum(xs).

StefanKarpinski · 2016-09-06T20:40:59Z

I'm not convinced that we need or want UInt1. If people want Z2 modular arithmetic, it seems like a modular arithmetic package would be a better way to get it.

TotalVerb · 2016-09-06T21:01:49Z

But we need a type that can be promoted upward to any other type. Currently we're using Bool for this, but UInt1 makes more sense.

StefanKarpinski · 2016-09-06T21:41:36Z

What do we need it for? There's not a lot of use cases that aren't inherently type unstable.

TotalVerb · 2016-09-06T22:45:16Z

For instance, im = Complex(false, true), ε = DualNumber(false, true), etc.

StefanKarpinski · 2016-09-06T22:57:18Z

Yeah, I'm not entirely sure those are a good idea anymore. Having an Imaginary type may be better.

TotalVerb · 2016-09-06T23:10:44Z

I disagree. Should we have three Quaternion types, one for each axis?

TotalVerb · 2016-09-07T03:53:20Z

Also, even with an Imaginary type, and even if we set im = Imaginary(one), we still need either one or UInt1(1).

vchuravy · 2016-09-08T23:35:14Z

I spend the last day writing a function in LLVM IR that operates on 2bits (packed in 64bits). This meant working with <32 x i2>. LLVM supports arbitrary bit lengths for integer and it would be nice to have support for them in Julia too. (Even of only through bitstype 2 UInt2). Maybe we could relax the restriction on the size in bitstype N UIntNthat it needs to be a multiply of 8? These types make little sense standalone, but as part of vectors they simplify working with packed data.

vchuravy · 2016-09-10T03:57:48Z

I am experimenting right now with relaxing the requirement on bitstype so that we might support odd types.

eschnett · 2016-09-10T13:57:22Z

It's not necessary to change how bitstypes work. You can instead implement n-bit integers by storing them in m bits, where m>=n, rounded up to the next byte. I've done this in SmallInts.

My motivation there was to test algorithms that use integers while being able to test the algorithms for all possible inputs. Using integers with fewer bits makes it easier (faster) to iterate over all of them, especially if an algorithm expects multiple arguments.

eschnett · 2016-09-10T13:58:24Z

(Apologies for not remembering that I already implemented a Uint1 type there.)

benninkrs · 2016-09-20T18:53:08Z

Notwithstanding the technical points that have been raised, I think the big conceptual issue remains: Is a Bool (or an element of BitArray for that matter) meant to represent a number or a logical value?

If a Bool is a number then its two values should be 0 and 1 (not true and false) and these should have the usual meaning and behavior of 0 and 1 in arithmetic contexts, e.g. 1+1 promotes to 2. In this interpretation the logical operations (|, &, $, ~) are foreign. On the other hand if a Bool represents a logical value, then the two values should be true and false and the logical operations are the appropriate ones to implement; in this case the arithmetic operators are puns with ambiguous interpretation.

I think it would be preferable to have numeric types and logical types treated rather distinctly, e.g. to deprecate support of arithmetic operators (+,*, etc.) and arithmetic functions (e.g. sum, sin) for logical values in favor of logical operations. In those cases where the user wants to treat an instance of one type as an instance of the other, I don't think it will be awkward to require explicit conversions. It will certainly be better for code clarity. And, as @StefanKarpinski suggested, if people really want to do algebra over GF(2), that can be achieved by importing a package that implements +,* for Bool (or UInt1 or some other type yet to be proposed.) Putting such behavior in a package rather than in Base will at least force people to think about what behavior they want, instead of silently surprising them with a default behavior they may not expect.

TotalVerb · 2016-09-20T19:22:31Z

I agree with most of your points. But |, &, $, and ~ are supported on all integer types, so I would argue they are not foreign to UInt1.

Also, I really think GF(2) is too important to leave out of Base. It's the most natural definition of im and similar.

benninkrs · 2016-09-20T20:14:51Z

@TotalVerb Yes, |, &, $, and ~ are supported for integer types, but they are foreign in the sense that they are imparting an additional structure on integers, namely a base-2 representation.

I agree that GF(2) is very important and should be provided in the standard distribution. But the question is, should this be the default behavior of Bools? I think both true + true == 2 and true + true == false are reasonable interprations, and I think many people will prefer one behavior will many will prefer the other.

An approach I think would be satisfactory for most people would be the following: Have the standard distribution come with two modules, with descriptive names such as BoolAsInt and BoolAsGF2, providing the two different behaviors for arithmetic operations on Bools. Neither module would be used by default, and an expression such as true + true would issue an error along with a suggestion to either convert the bool values to numbers, or to use one of the two modules to implement the desired behavior. With this approach, anybody can easily obtain either desired behavior (even different behaviors in different parts of a project) and there is no danger of unexpected behavior.

TotalVerb · 2016-09-20T20:37:20Z

I oppose the idea of having two different modules. Base should provide a sensible default, not excessive customizability.

benninkrs · 2016-09-20T21:08:04Z

@TotalVerb I agree that having a sensible default would be preferable, provided it doesn't cause confusion or make it too awkward to achieve the other behavior when desired. Especially in this case, since both behaviors seem about equally reasonable and useful. As someone fairly new to Julia, I can think of only four ways to support both behaviors:

Provide a definitive behavior of +, *, etc. for Bool, and provide functions with different names to implement the other behavior(s).
Provide a default behavior of +, *, etc. for Bool along with a module that can be imported to override the default behavior (likely to cause all sorts of confusion)
Provide two Bool-like types, on which + and * behave differently (also confusing and awkward)
Provide no default behavior for Bool arithmetic, and have the user choose the desired behavior (if needed) on a per-namespace basis by using the appropriate module

Each option has its advantages and disadvantages, but 1 and 4 seem to me to be the least undesirable.

TotalVerb · 2016-09-20T21:53:24Z

My opinion is that any time you have two sensible behaviours for a type, and are tempted to allow the user to choose behaviour, then that type is a type-pun: it's serving the purpose of two types. Hence I'm more comfortable with option 3.

benninkrs · 2016-09-21T02:17:49Z

My opinion is that any time you have two sensible behaviours for a type, and are tempted to allow the user to choose behaviour, then that type is a type-pun: it's serving the purpose of two types

That's a fair point. So, suppose there is a boolean type for GF(2) arithmetic. Then would it be useful to have a second type for "regular" arithmetic, if it just ends up getting promoting in most operations? Or even a third type with logical (as opposed to algebraic) semantics?

dlfivefifty · 2016-11-02T21:39:32Z

What about

zero(::Type{Any}) = UInt1(0)*I
one(::Type{Any}) = UInt1(1)*I

That is return a UniformScaling, which should behave like zero/one for all types.

stevengj · 2016-11-02T22:11:54Z

@dlfivefifty, UniformScaling is a multiplicative identity. This doesn't make sense if a type implements a group under + and not *.

dlfivefifty · 2016-11-02T22:33:04Z

Right, but it will support the most common case of Number and Array. Then any other group (I'm not aware of any in Base) would through an error that +(::MyGroup,::UniformScaling) is not defined.

dlfivefifty · 2016-11-02T22:34:00Z

I guess only square Matrices as rand(10,3) + 0*I errors out.... nevermind

vtjnash · 2018-07-08T02:27:02Z

I think it's safe to say we decided not to do this for v1.0

TotalVerb mentioned this issue Sep 5, 2016

cumsum fixes (fixes #18363 and #18336) #18364

Merged

ararslan mentioned this issue Sep 9, 2016

Provide method for sum that accepts a neutral element #18423

Closed

StefanKarpinski added help wanted Indicates that a maintainer wants help on an issue or pull request needs decision A decision on this change is needed labels Oct 31, 2016

StefanKarpinski mentioned this issue Oct 31, 2016

make Bool <: Unsigned (i.e. UInt1) #19168

Closed

m-j-w mentioned this issue Aug 28, 2017

Feature Request: Bool <: Boolean #23488

Closed

vtjnash closed this as completed Jul 8, 2018

goretkin mentioned this issue Apr 25, 2020

Defining zero() seriously #34003

Open

benninkrs mentioned this issue Jun 5, 2020

Change one to onemult #36103

Open

nsajko closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2024

nsajko added maths Mathematical functions types and dispatch Types, subtyping and method dispatch and removed needs decision A decision on this change is needed help wanted Indicates that a maintainer wants help on an issue or pull request labels Aug 17, 2024

Introduce UInt1 type to replace misuse of Bool #18367

Introduce UInt1 type to replace misuse of Bool #18367

Comments

TotalVerb commented Sep 5, 2016 • edited Loading

TotalVerb commented Sep 5, 2016

stevengj commented Sep 6, 2016

TotalVerb commented Sep 6, 2016 • edited Loading

ararslan commented Sep 6, 2016 • edited Loading

TotalVerb commented Sep 6, 2016

oxinabox commented Sep 6, 2016 • edited Loading

TotalVerb commented Sep 6, 2016

TotalVerb commented Sep 6, 2016

ararslan commented Sep 6, 2016

andyferris commented Sep 6, 2016

stevengj commented Sep 6, 2016 • edited Loading

andyferris commented Sep 6, 2016

stevengj commented Sep 6, 2016

TotalVerb commented Sep 6, 2016 • edited Loading

ararslan commented Sep 6, 2016 • edited Loading

stevengj commented Sep 6, 2016 • edited Loading

stevengj commented Sep 6, 2016 • edited Loading

stevengj commented Sep 6, 2016 • edited Loading

TotalVerb commented Sep 6, 2016

TotalVerb commented Sep 6, 2016 • edited Loading

TotalVerb commented Sep 6, 2016

stevengj commented Sep 6, 2016

TotalVerb commented Sep 6, 2016

StefanKarpinski commented Sep 6, 2016

TotalVerb commented Sep 6, 2016

StefanKarpinski commented Sep 6, 2016

TotalVerb commented Sep 6, 2016 • edited Loading

StefanKarpinski commented Sep 6, 2016

TotalVerb commented Sep 6, 2016

TotalVerb commented Sep 7, 2016

vchuravy commented Sep 8, 2016

vchuravy commented Sep 10, 2016

eschnett commented Sep 10, 2016

eschnett commented Sep 10, 2016

benninkrs commented Sep 20, 2016

TotalVerb commented Sep 20, 2016 • edited Loading

benninkrs commented Sep 20, 2016

TotalVerb commented Sep 20, 2016

benninkrs commented Sep 20, 2016 • edited Loading

TotalVerb commented Sep 20, 2016

benninkrs commented Sep 21, 2016 • edited Loading

dlfivefifty commented Nov 2, 2016

stevengj commented Nov 2, 2016

dlfivefifty commented Nov 2, 2016

dlfivefifty commented Nov 2, 2016

vtjnash commented Jul 8, 2018

Introduce `UInt1` type to replace misuse of `Bool` #18367

Introduce `UInt1` type to replace misuse of `Bool` #18367

TotalVerb commented Sep 5, 2016 •

edited

Loading

TotalVerb commented Sep 6, 2016 •

edited

Loading

ararslan commented Sep 6, 2016 •

edited

Loading

oxinabox commented Sep 6, 2016 •

edited

Loading

stevengj commented Sep 6, 2016 •

edited

Loading

TotalVerb commented Sep 6, 2016 •

edited

Loading

ararslan commented Sep 6, 2016 •

edited

Loading

stevengj commented Sep 6, 2016 •

edited

Loading

stevengj commented Sep 6, 2016 •

edited

Loading

stevengj commented Sep 6, 2016 •

edited

Loading

TotalVerb commented Sep 6, 2016 •

edited

Loading

TotalVerb commented Sep 6, 2016 •

edited

Loading

TotalVerb commented Sep 20, 2016 •

edited

Loading

benninkrs commented Sep 20, 2016 •

edited

Loading

benninkrs commented Sep 21, 2016 •

edited

Loading