Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Consider using boolean vector as result of comparison operations #179

Closed
johnmccutchan opened this issue May 27, 2015 · 25 comments
Closed

Comments

@johnmccutchan
Copy link
Collaborator

No description provided.

@johnmccutchan
Copy link
Collaborator Author

@johnmccutchan
Copy link
Collaborator Author

@PeterJensen

@johnmccutchan
Copy link
Collaborator Author

Immediate benefit:

  • Do not need to introduce Int64x2 for the comparison result of Float64x2.

Issues to consider:

  • What do we do when Loading / Storing BoolxN?
  • Efficient conversion between BoolxN and IntMxN
  • Individual lane access?
  • Are implementations expected to store this as an i1 mask or can it be opaque?

@littledan
Copy link
Member

Quick suggestion for these:

  • Don't allow loading and storing BoolxN. Also don't define arithmetic operations on them; only things like and, or, not, xor, allTrue, anyTrue, select (am I missing any?)
  • The conversion functions defined are SIMD.BoolxN.fromIntxN and SIMD.IntxN.fromBoolxN. On platforms like NEON and SSE, these will just change the type tag and give the same contents. Maybe the compiler can optimize them away, since the type tag won't exist after unboxing.
  • Individual lane access could be provided by extractLaneAsBool, with extractLane being undefined. (We probably also need replaceLaneAsBool).
  • It should definitely be opaque so that on NEON and SSE it can be represented as it is right now. Leave out bitwise casts to other types to and from boolean vectors.

@jfbastien
Copy link

As discussed offline I strongly support using bool vectors to represent the results of comparisons (i1xN where N is the compared vector's number of elements). @littledan summarized our discussion appropriately.

I'm not sure I'd expose conversion functions: you can use a select operation instead to convert to/from bool vector.

Rational from our discussion:

Some architectures either do all-zero/all-one full-width registers, sign-bit of full-width registers, or a separate mask register (aside: it would be useful to have a survey of different architectures' approaches).

Using bool vector is what PNaCl and LLVM do, but PNaCl only allow i1 vectors as intermediate SSA values (can't be stored, can't be passed to functions or returned). Sign extension is sufficient to model the other ISA approaches (and to load/store), whereas the current SIMD.js approach doesn't map well onto architectures which use a mask register.

This proposal isn't directly related to Intel's move mask instruction which grabs the sign bits and move them into a GPR as an i1 mask (which I didn't like in SIMD.js).

i1xN:
Is more general that mask vectors (can still be used for bit selection as well as control flow).
Can be better optimized (because it's obviously the result of a comparison, and doesn't support load/store/general arithmetic)
Is more forward-looking than overlapping with integer SIMD types (for architectures which don't overlap SIMD comparisons results with all-1/all-0 vectors).

@sunfishcode
Copy link
Member

The biggest issue here is what to do for platforms with mask registers. In #180, I have proposed a different perspective on such platforms, which I'd like us to think about in discussions about the best way to represent mask registers.

@littledan
Copy link
Member

@sunfishcode I think we can pursue both in parallel--future-proof the API for platforms mask registers through boolean vectors, and also figure out the long vector API.

@ghost
Copy link

ghost commented May 28, 2015

Is there any chance that the dedicated type could let a compiler assume that a vector register contains boolean values in a canonical form (all ones/all zeroes, as in the result of a compare) so that it doesn't have to re-normalise it when implementing select() using selectBits() (NEON has only the latter)? Or can the compiler remember when this is true in all reasonable cases without extra help?

Allowing bitcast conversion between int and bool would spoil that, of course.

@littledan, not sure if this counts as "missing any", but there were some other boolean reductions raised in #81 which weren't added to the spec: popCount, countLeadingFalse, countTrailingFalse, and operations trivially derived from those. There wasn't much call for them at the time. Their implementations might benefit from the same assumption.

@sunfishcode
Copy link
Member

Renormalization is always at most an arithmetic shift right by immediate 31, and engines can elide even that in the common case where the input is the result of a compare instruction.

@littledan: If we choose to take a long-SIMD approach for mask registers, then it's not clear what other problem we're being asking us to solve here. The committee today expressed no objection to int64x2.

I can see an aesthetic concern about how booleans should be represented inside SIMD lanes. On the other hand I'd have concerns about limitations like value types that can't be passed as an argument or returned as a return value. And regarding future-proofing, vectors of booleans will make it more awkward if we choose to add explicit 256-bit SIMD in the future, because it means that on non-mask-register platforms, we'll have two different in-register representations for the same type, depending on context.

@littledan
Copy link
Member

@simhos01 I think you're right here. I don't think we need to define bitwise cast operations to or from bool types if it makes anything harder like maintaining representations in canonical form, if that's useful. If some things might be faster on boolean vectors because canonical representations can be ensured, then that's a great argument for them, but I don't know enough about those operations to understand whether it'd be faster this way.

@sunfishcode Waldemar raised the issue that the Int64x2 vector isn't well-specified at all. The spec machinery just doesn't work with 64-bit integers yet, and there's a lot about Int64x2 that makes it an exception, in particular all the methods that are missing and make it not orthogonal to the other integer types. I am not sure how to fix the spec in a clean way; unless you have some more ideas, then the way I was thinking of writing it, Int64x2 will look like a wart in the spec and have a bunch of language devoted to explaining it as an exception. Admittedly, a wart in the spec could be OK if we have a good reason for it, which could be if there's a lot of implementation complexity or especially if it has performance implications.

About the need for platforms to have two representations for the same type, depending on context: I'd imagine there'd be a boxed form, which could either handle both representations or normalize it to one, as it chooses, and a compiler would unbox when possible. The unboxed versions would probably want to avoid the normalization. But all of this stuff would be optional optimizations, and the optimizations could still be done if we use integer vectors for select--it would just take place after the compiler rebuilds the abstract flow based on the lower-level optimizations.

@littledan
Copy link
Member

See #178 for the current issues with the Int64x2 type.

@littledan
Copy link
Member

@simhos01 About people in the group not calling for certain features: I think the review there was more about how SIMD fits into ECMAScript, and not about whether the draft spec or current polyfill includes all the operations that anyone would need, especially since I mentioned that the draft spec didn't include all operations.

@sunfishcode
Copy link
Member

@littledan My impression from that discussion was that the issues with Int64x2 were just spec text issues. Int64x2 fully works in the polyfill using only ES5 features, and it's pretty simple, so I assume we can spec it in a similar manner. And in the long term, I expect ES will eventually get a true int64 type.

When I'm talking about two representations, is an Int1x4 lowered to 1 Int32x4 register or 2 Int64x2 registers (if we add 256-bit types) on a non-predicated architecture? We can always make things work, but this creates some awkward special cases that implementations will have to deal with. One of the places this bubbles up is as the reason why they can't be passed as arguments or returned as return values.

I don't see the limited nature of Int64x2 in the spec as a major wart. There are already inconsistencies between what's available for Int32x4 and Int16x8 for example, and those also come from looking at the expected use cases for each type.

@jfbastien
Copy link

I'm not sure I understand part of the discussion: do we agree that if we had bool vectors then developers wouldn't be able to observe how they get materialized by the ISA-specific implementation?

This is the main reason I like bool vectors! It's not observable, and maps better on all the architectures. The compiler is free to materialize the comparison result however it wants, but developers can't observe this.

If developers do want to materialize the vector then they can do something like select(boolvector, alltrue, allfalse).

@sunfishcode
Copy link
Member

Developers can't observe ISA-specific implementation details in the current spec either.

@littledan
Copy link
Member

The current spec forces greaterThan, select, etc to use a particular binary format in representing their boolean vectors. This doesn't constitute seeing ISA-specific implementation details, but it can only be implemented as efficiently on certain architectures (e.g., not AVX apparently, even if that isn't an issue for 128-bit vectors). On the other hand, because you can do even less with boolean vectors, an implementation would not have to do extra work to materialize the result of greaterThan, etc.

@sunfishcode
Copy link
Member

Here's a new idea, a possible hybrid approach:

Instead of Int1x4, what if we introduced new vector-of-boolean types that were explicit about what SIMD element size they are intended to be used with? For example, Vec32x4Bool could be a boolean vector type suitable for holding Float32x4 and Int32x4 compare results. Then if we ever add Float64x4, its compare result type could be Vec64x4Bool, so it'd be a different type. So in the current 128-bit-only proposal, we'd add Vec8x16Bool, Vec16x8Bool, Vec32x4Bool, and Vec64x2Bool.

These Bool types would support things that make sense for boolean vector types, such as and, or, xor, allTrue, anyTrue, but they wouldn't be fully general integer types supporting add, mul, and so on.

This would fix the multiple-representations concern I have, and it would clear the way for these values to be passed as arguments and returned as return values. And it would still work for mask-register architectures.

If we wanted to make these Bool types storable in typed arrays, we could probably make up a stable serialization that wouldn't necessarily be amazing, but wouldn't be horrible either, on any platform.

There are a few downsides. It would introduce yet more types, and we already have quite a few types. But maybe that's ok. And if we some day have Vec64x4Bool, we'd need operators to convert between that and Vec32x4Bool, which under the plain vector-of-bool approach would have been implicit. But while this operation could be an actual no-op on mask-register platforms, it takes some work to do on non-mask-register platforms, so it might benefit from being explicit in the API anyway.

I'm totally open to better naming schemes.

Thoughts?

@littledan
Copy link
Member

I like this idea. I have to say I didn't understand your concern before, but now it seems obvious. I think the growth in the number of types would be worth it. Do you think we'd still need select, etc defined on Intnxm, or would we only want them for the bool new vectors?

Bikeshedding on the name: Bool32x4?

@jfbastien
Copy link

Strong +1 on @sunfishcode's proposal (as we discussed on IRC). It keeps the
type information around, and resolves the weirdly-non-storable issue.

Bikeshed names:

  • Vec32x4Pred
  • Vec32x4Cond
  • Vec32x4_p
  • Vec32x4?
  • Vec32x4‽

@ghost
Copy link

ghost commented May 29, 2015

So we're specifically trying to distinguish between a four-lane boolean for a 128-bit type and four-lane boolean for a 256-bit type, while still hiding the underlying fact that on (some) architectures this is implemented as an all-zero/all-one integer of the corresponding size?

I don't know JavaScript well enough to understand the benefit. I'd have assumed that the compiler implementation could retain but hide the distinction as it saw fit.

On the bikeshed front, I feel slightly confused about the integers on left and right of the x. Vec16 and Bool16 for me have always been 16-lane types, but that's what's required for continuity with int16xN.

@sunfishcode
Copy link
Member

I like Bool32x4. I sympathize with @simhos01; it took me a moment to get past my instinctive interpretation of what that name looks like it means to see how it even makes sense, but it does, and it's consistent with the rest of the API.

The benefit is at the implementation level. Given the type Bool32x4 on a particular platform, the JIT can statically know the in-register representation, without even having to consider context. This simplifies a bunch of things. It's admittedly a low-level implementation concern, but SIMD.js is a low-level API.

@ghost
Copy link

ghost commented May 30, 2015

I suppose that separate types emphasise to the developer that there may be work in converting between Bool16x8 and Bool32x8, if the latter should ever exist. This sort of thing can happen in NEON code when using the widening/narrowing operations (with 256-wide arithmetic emulated by repeating operations on extra registers), and aligning the booleans would be extra work there.

I don't know how the different masks in AVX work in that respect.

@sunfishcode
Copy link
Member

#189 has an implementation of this idea.

@littledan
Copy link
Member

We discussed this idea in the call today and decided to go with boolean vectors, as implemented in #189 .

@sunfishcode
Copy link
Member

#189 is now merged, so this is fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants