Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64-bit pointers used with 32-bit only instructions (if, br_if, select, call_indirect) #10

Closed
aardappel opened this issue Oct 1, 2020 · 11 comments
Labels

Comments

@aardappel
Copy link
Contributor

There are a number of operations in Wasm that are i32 only, mainly because they are deemed booleans, primarily among them if, br_if and select. Because pointers were 32-bit, and testing a null-pointer is a very frequent operation, these could be used with pointers.

But so far in this proposal we do not have this functionality. 32-bit code like:

local.get $ptr
if

Now has to be emitted as:

local.get $ptr
i64.eqz
i32.eqz
if

Or (1 more byte):

local.get $ptr
i64.const 0
i64.ne
if

Though I'm sure this has no consequences in most engines, it is extra code size, and if we want Memory64 to be on equal footing, I think it be worth considering 64-bit versions of these instructions (or making them polymorphic).

More minor concerns being that it requires special casing in producers, and is just less readable.

It may not be super frequent in C/C++/Rust code, but imagine higher level languages that emit null checks on each memory access.

We already have code in Binaryen that emits extra truncates for checking alignment, for example.

Another instruction to consider is call_indirect that always takes an i32 index. Thing is, that index is what many producers will consider a function pointer. For example, we already have code in LLVM to emit truncates here, because we store these indices in an i64 for uniformity with other kinds of pointers in the wasm64 target. This is less likely to be performance sensitive than the above example though, but could be considered out of uniformity.

Question is, when?

  1. It could be added to the current proposal because it is a minor change.
  2. It should be a follow-up proposal because it would endanger Memory64 to be delayed.
  3. It's not worth it and is not needed at all.

Opinions?

@rossberg
Copy link
Member

rossberg commented Oct 5, 2020

Interesting points.

To me, the former rather seems to suggest introducing a iNN.nez instruction instead of duplicating every bool-consuming instruction. In fact, I remember earlier complaints that Wasm does not have such an instruction. That seems enough of a no-brainer.

The latter seems to suggest that wasm64 should include i64-indexed table types as well -- otherwise a call_indirect with i64 operand would be rather incoherent, and moreover, it might not be the only table instruction one might want to use with i64 index, once that type is used for table indices.

That probably would easy to design is well, and in principle it'd nicer to ship a coherent wasm64 package. But it is somewhat harder to motivate at this point and probably a significant implementation effort.

@aardappel
Copy link
Contributor Author

@rossberg
iNN.nez would indeed be welcome, as that double eqz sequence is very odd.. and would be easier to ensure for even simpler engines that it becomes a no-op (or close to).

Yet, at the same time, replacing 2 ops with 1, and still making wasm64 fundamentally less equal than wasm32 seems inelegant to me.
You say "duplicating every bool-consuming instruction".. there's just 3 of them. Would you say duplicating is likely better than polymorphizing the existing ones?

Certainly, adding new control flow instructions that have their own encoding is a lot bigger churn on tools & engine code, but I could imagine a polymorphic instruction has consequences for the validator.. then again we just made all memory ops polymorphic already.

@rossberg
Copy link
Member

rossberg commented Oct 6, 2020

It wouldn't be polymorphic (= work for all types of a certain shape), though, it would be overloaded (= work for a special finite set of types). Wasm has intentionally tried to hold the line and avoid overloading.

And if you duplicate instructions, will there only be 3 such instructions forever? Will it be only two integer/pointer types forever? Is it consistent to have i64-bool consuming functions but not i64-bool producing functions (ref.is_null etc)? This would be a fairly odd design that scales poorly. For better or worse, Wasm uses i32 as its bool type. (Originally we even discussed introducing a separate bool type, in which case such shortcuts wouldn't be possible either.)

@lars-t-hansen
Copy link

Additionally, br_table privileges i32. In a system with tagged i64 addresses, br_table (ptr & MASK) may be a thing for overloaded operations.

@aardappel
Copy link
Contributor Author

@rossberg

Will it be only two integer/pointer types forever?

Linear memory pointer types, yes :)
Are you seriously arguing that we shouldn't extend support for 64-bit pointers because one day we may have 128-bit pointers??

but not i64-bool producing functions (ref.is_null etc)?

ref.is_null doesn't produce a pointer, so is fine to have its output consumed by the existing 32-bit version.

For better or worse, Wasm uses i32 as its bool type.

Wasm doesn't have a bool type, it has bool instructions, that can operate on anything i32 sized, including pointers.

@lars-t-hansen

br_table privileges i32

But it doesn't operate on pointers directly. The fact that its input can possibly originate from bits inside a pointer doesn't sound quite as directly applicable as the null-checks in my original post.

@rossberg
Copy link
Member

rossberg commented Oct 7, 2020

Are you seriously arguing that we shouldn't extend support for 64-bit pointers because one day we may have 128-bit pointers??

I am arguing that an instruction set preferably is regular. If some instruction is available for more than one int type, then I'd like it to work for all available int types.

(As an aside, I also wouldn't be totally surprised if we get i16 (or even wasm16) some day, despite all intentions to the contrary.)

ref.is_null doesn't produce a pointer, so is fine to have its output consumed by the existing 32-bit version.

A compiler may have good reasons to use "word size" for bools in various circumstances. In wasm64 that would mean 64 bit bools. By the same argument you bring up above, it would hence take 64 bit versions of bool-producing instructions to bring wasm64 "on equal footing". So, if that's the goal, then we ought to consistently support other sizes for all bool ops.

@aardappel
Copy link
Contributor Author

I also wouldn't be totally surprised if we get i16 (or even wasm16) some day, despite all intentions to the contrary.

If that ever happens (which I find exceedingly unlikely) then the cost to make similar changes as we are making here for wasm64 don't seem that onerous to me. I think we can agree that 64-bit pointers are endlessly more important for this world than 16 or 128-bit ones at this point?

A compiler may have good reasons to use "word size" for bools in various circumstances. In wasm64 that would mean 64 bit bools. By the same argument you bring up above, it would hence take 64 bit versions of bool-producing instructions to bring wasm64 "on equal footing". So, if that's the goal, then we ought to consistently support other sizes for all bool ops.

I would say that the concern of being able to represent a bool type as an i64 is completely separate from the question of wether we should be able to test an existing i64 conveniently as a bool. Like I said, for ref.is_null, if it is tested directly, there is no issue. The only issue might be if you want to store the result of ref.is_null in a local or in memory as an i64 because the language wants to interpret all bools as i64 and wants to operate on this value before it is tested. I find that a lot more far fetched than the simple need to test pointers for null.

More generally, you seem to take an all or nothing approach around "what is a bool?" and want to disallow very frequently useful instruction sequences if we don't do it for all possible such sequences that relate to bools. This seems counter to the idea that Wasm is an implementation substrate, not a language. How to represent a bool or a pointer is a concern of the language mapping to Wasm, and we should simply give them the tools to do so. We are already very biased in what concepts we allow a language to map directly, so being pragmatic and enabling just a few common sequences for 64-bit pointers doesn't seem that crazy to me.

@binji
Copy link
Member

binji commented Oct 26, 2020

I think iNN.nez is pretty uncontroversial, so we should consider that separately. It may be worth including in the memory64 proposal, but we may want to include it with some other integer instructions.

As for the rest (overloading br_if, if, select), I'd be interested to see some data -- how often this occurs, and how many extra bytes we're talking about. Code size is important, but so is producer and consumer complexity. So going back to your initial options:

  1. It could be added to the current proposal because it is a minor change.
  2. It should be a follow-up proposal because it would endanger Memory64 to be delayed.
  3. It's not worth it and is not needed at all.

I'd go with 2, unless we can show value now.

@aardappel
Copy link
Contributor Author

I'd be interested to see some data -- how often this occurs

It's hardly possible to produce fair data about this at this point. Maybe some sample binaries we have right now don't use it at all, and some future language that depends on null-pointer checks has them absolutely everywhere.

I am going more by principle, that something as simple as a null-pointer check should not need 1/2 extra instructions compared to the wasm32 version, just because we designed Wasm around 32-bit everything and we don't care to make 64-bit a symmetric extension.

But sure, we can wait until the MVP lands, and someone actually needs this. It's more important that the basics of this proposal land first, the problem in this issue will not hold anyone back fundamentally.

@Zackaryjacob

This comment was marked as off-topic.

@sbc100
Copy link
Member

sbc100 commented Oct 22, 2024

We ended up adding a 64-bit call_indirect as part of adding 64-bit table support.

For the other instructions discussed here I think they can be considered in future proposals. Feel free to re-open if you disagree.

@sbc100 sbc100 closed this as completed Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants