Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RLS for LHS of shift operands #16489

Open
mlugg opened this issue Jul 22, 2023 · 11 comments
Open

RLS for LHS of shift operands #16489

mlugg opened this issue Jul 22, 2023 · 11 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mlugg
Copy link
Member

mlugg commented Jul 22, 2023

Proposal

In status quo, it is common to have to write code like this:

@as(u64, 1) << n  // where n is runtime-known

This is because the LHS of a shift operand never has any known result type, so often must be explicitly provided, which (for a constant LHS) probably means using @as.

The type returned by a shift operation is (in status quo) always the same as the type of its LHS. That means we could change the language to apply RLS.

The rule here is simple: if a << b or a >> b has a result type T, forward that result type onto a. If there is no result type, work as normal. This would make, for instance, the following work:

var n: u6 = something();
const x: u64 = 1 << n;

Breaking

This is almost a non-breaking change. There are two ways I can see for existing code to break.

The first is related to comptime integer coercions. Consider this code:

const x: u8 = @as(u64, 256) >> 1;

Here, the type of the shift expression is a u64, but it has value 128, so (since it is comptime-known) can coerce safely down to a u8.

Under the new semantics, the result type of u8 will be forwarded to the LHS, resulting in this code emitting a compile error since 256 cannot coerce to a u8. The failure mode for this breaking change is a compile error.

The second way code could break is the other way around: upcasts. Consider:

const x: u64 = @as(u8, 255) << 1;

In status quo, the value stored in x is 254, because the shifted top bit is discarded (due to the shift being on a u8). Under the new semantics, the LHS is coerced up to a u64, making the result value instead 510. This change could cause incorrect behavior in a program. However, I anticipate this pattern of depending on truncation of a shift but immediately coercing it up afterwards is rare. Indeed, when presented differently, it looks more like this proposal is solving a footgun:

const x: u8 = foo();
// [a bunch of stuff]
const y: u64 = x << 2;

It's not great that you currently have to look back at the type of x to understand this behavior: under the proposal it can instead be gained through local information, which is probably the intent in this snippet. Only if type inference is used for y does the type of x matter.

@marler8997
Copy link
Contributor

marler8997 commented Jul 22, 2023

There's another idea I think we should consider which may also solve the underlying issue; supporting bitshift on comptime_int. I think you can come up with reasonable set of well-defined semantics for bitshifting comptime_int if you think of it as having an infinite number of bits available going to the left (not going to the right). The difference between positive/negative values is that this infinite set of digits are either all 0s or 1s respectively.

With this solution you get to keep a little more information as you're evaluating an expression. Issues where a developer uses the wrong type become compile-errors instead of runtime errors because zig's comptime-analyzer will know whether a comptime_int value will fit into the needed type.

P.S. I just noticed your first example said n (the shift amount) is runtime known, which comptime_int wouldn't solve...

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jul 22, 2023
@andrewrk andrewrk added this to the 0.12.0 milestone Jul 22, 2023
@dweiller
Copy link
Contributor

The second way code could break is the other way around: upcasts. Consider:

const x: u64 = @as(u8, 255) << 1;

Could the truncation before the upcast/coercion be achieved with a one-liner like this

const x: u64 = @as(u8, 255 << 1);

or would one need an intermediate result:

const x: u64 = x: {
    const intermediate: u8 = 255 << 1;
    break :x intermediate
};

@dweiller
Copy link
Contributor

dweiller commented Jul 23, 2023

The first is related to comptime integer coercions. Consider this code:

const x: u8 = @as(u64, 256) << 1;

Here, the type of the shift expression is a u64, but it has value 128, so (since it is comptime-known) can coerce safely down to a u8.

Under the new semantics, the result type of u8 will be forwarded to the LHS, resulting in this code emitting a compile error since 256 cannot coerce to a u8. The failure mode for this breaking change is a compile error.

Did you mean to have a >> here rather than <<? Assuming you did, I assume the fix would just be to wrap the RHS expression in an @intCast() like so: const x: u8 = @intCast(@as(u64, 256) >> 1);?

@dweiller
Copy link
Contributor

dweiller commented Jul 23, 2023

After looking through some of my bitshifting code, I seem to be using expressions like this at the moment:

return @intCast(last_special_size + i * step_size_base * 1 << size_shift);

where size_shift: u6, i, last_special_size and step_size_base should all be usize and the result type is u32. I'm not actually sure why the compiler isn't complaining about the shift at the moment - is there something about having the shift as a subexpression of more arithmetic that changes something or am I exploiting a bug/inconsistency in the current compiler that a solution to this issue would probably fix along the way?

@ghost
Copy link

ghost commented Jul 23, 2023

The second way code could break is the other way around: upcasts. Consider:

const x: u64 = @as(u8, 255) << 1;

... However, I anticipate this pattern of depending on truncation of a shift but immediately coercing it up afterwards is rare.

Rare or not, asking specifically for one type and getting another is really, realy not good, in my opinion.

@mlugg
Copy link
Member Author

mlugg commented Jul 25, 2023

@marler8997

There's another idea I think we should consider which may also solve the underlying issue; supporting bitshift on comptime_int.

Shifting comptime_int at comptime is already allowed, hence why (as you noticed) my example was about doing the shift at runtime.

@dweiller

Could the truncation before the upcast/coercion be achieved with a one-liner like this:

Yes.

Did you mean to have a >> here rather than <<?

Oops, yeah I did, good catch.

Assuming you did, I assume the fix would just be to wrap the RHS expression in an @intCast() like so:

Yeah, that's how I'd do it. To be honest, I'm kind of skeptical about comptime-int-downcasts-via-coercion as a feature, but that's a separate issue.

After looking through some of my bitshifting code [...] I'm not actually sure why the compiler isn't complaining about the shift at the moment

Is size_shift comptime known? It's allowed to shift comptime_int if the RHS (and hence result) is comptime known (it just gives back another comptime_int).

@zzyxyzz

Rare or not, asking specifically for one type and getting another is really, realy not good, in my opinion.

Sure, but I really don't think this behavior can be considered "asking specifically for one type" in any reasonable way. In the cases where this could trigger a bug, the relevant line of code is either mentioning a single type, which is the type which will now be used, or it's mentioning two types, and we're just changing which should be used. i.e:

const x: u64 = @as(u8, 255) << 2; // why should this be considered "asking for" u8 rather than u64?

const y: u8 = something;
// ...
const z: u64 = y << 2;
// this one's even worse - the definition of y could be 100 lines earlier
// meanwhile the line of the shift explicitly says the expression results in a u64

@ghost
Copy link

ghost commented Jul 25, 2023

const x: u64 = @as(u8, 255) << 2; // why should this be considered "asking for" u8 rather than u64?

Because of the explicit @as? We are doing representation-sensitive arithmetic here, so this code is actually not entirely unreasonable. Similar things could happen if you have a saturating or wrapping addition to perform, but then store the result in a wider type for some reason. I think this could actually happen in bit-twiddling code and would then be a total pain to debug.

I can see your point as well, but still, the proposed solution breaks the least surprise principle pretty badly here.

If this proposal moves ahead, I'd recommend to at least make the use of @as in a context where the type is immediately overridden by inference a compile error.

@dweiller
Copy link
Contributor

dweiller commented Jul 26, 2023

Is size_shift comptime known? It's allowed to shift comptime_int if the RHS (and hence result) is comptime known (it just gives back another comptime_int).

size_shift is not comptime known

@mlugg
Copy link
Member Author

mlugg commented Jul 26, 2023

@dweiller Oh wait, I just realised - arithmetic has a higher precedence than shifts. That means your code parses as (last_special_size + i * step_size_base * 1) << size_shift, so the LHS of the shift is properly typed. I'm assuming that's not intentional lol (related: #114)

@dweiller
Copy link
Contributor

If this proposal moves ahead, I'd recommend to at least make the use of @as in a context where the type is immediately overridden by inference a compile error.

I think this makes sense for readability. Without this, unless the reader is aware of/remembers precisely how RLS and coercion work for various operations I agree that shifts left shifts and wrapping/saturating operations. I'm sure there is a sensible/consistent general rule to achieve this though.

@dweiller
Copy link
Contributor

dweiller commented Jul 26, 2023

@dweiller Oh wait, I just realised - arithmetic has a higher precedence than shifts. That means your code parses as (last_special_size + i * step_size_base * 1) << size_shift, so the LHS of the shift is properly typed. I'm assuming that's not intentional lol (related: #114)

Hmm, I'll have to check - the code passes tests but if it is being parsed that way something does seem off.

Edit: I should have thought about it for a few moments longer, multiplication and << b are order-independent: (a * b) << c == a * b * 2 ^ c == a * (b * 2 ^ c) == a * (b << c)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

4 participants