Space-optimize `Option<T>` for integral enum `T` #14540

ghost · 2014-05-30T10:06:10Z

_Summary_
I propose a space optimization for variables of type Option<E> when E is a nullary, integral enum type.

_Motivation_
There's no need to waste memory for storing a separate tag in variables of type Option<E> if E is an integral enum type and the set of valid values of E does not cover all possible bit patterns. Any bit pattern (of the size of E) that doesn't represent a valid value of type E could be used by the compiler to represent the None value of type Option<E>.

_Details_
Given a nullary, integral enum type E, the compiler should check if some bit pattern exists which does not represent a valid value of type E (the only valid values are the ones determined by the nullary enum variants of E). If such "invalid" bit patterns are found, the compiler should use one of them to represent the None value of type Option<E> and omit storing the tag in variables of type Option<E>. If more than one such "invalid" bit pattern exists, there should be a language defined method to deterministically determine which one of those bit patterns is used to represent the None value. I think the bit pattern of None should be language defined rather than implementation defined in order to make Option<E> values serialized to disk more stable between different compilers / compiler versions.

In determining whether a certain value of such space optimized type Option<E> is None or not, the algorithm should simply check whether or not the binary representation of said value is equal to the binary representation of the language defined "invalid" value.

The text was updated successfully, but these errors were encountered:

erickt · 2014-06-03T14:51:29Z

I whipped up a dummy example to see how this would optimize Option<Result<T, E>> types, and it has a pretty nice 8% speedup: https://gist.github.com/erickt/8a6be5c8a2542eaf0c45. This would be especially helpful for my libserialize rewrite RFC.

ghost · 2014-06-04T05:36:33Z

I don't want to comment too much on what should be the language defined method of determining the one "invalid" bit pattern that would get chosen among all the possible ones. But I do believe that if at least one "invalid" bit pattern (interpreted as the underlying integral type of the enum) exists such that it is either larger than the largest enum variant or smaller than the smallest enum variant, then one of those should be guaranteed to get chosen. This enables users to create C-like enums that they use as integers bounded to a certain continuous range of values (by providing their own unsafe iterators for such an enum type). For example, given the following enum:

#[repr(u8)]
enum Value {
    Min = 1,
    Max = 3
}

The language should guarantee that the "invalid" bit pattern used for representing the None value of type Option<Value> is either 0b_0000_0000 (less than Value::Min), or in the range of values [0b_0000_0100, 0b_1111_1111] (greater than Value::Max). That's assuming the underlying integral type of Value is u8.

[Edit]
Ignore this whole comment. When I wrote it I didn't realize that it's undefined behaviour to have an enum set to a discriminant value that's not included in the definition of the enum. Therefore an iterator that traverses from Value::Min to Value::Max in increments of 1, would have to along the way store the non-existing discriminant value 2 into the enum which would be UB. Therefore it would be fine to use value 2 as a the underlying representation for Option::None::<Value>. Although maybe the rule that determines which value is used to represent internally the None case should favor the value 0 the most, because I think comparison against 0 may be faster than comparison against other values on some processors.

pczarn · 2014-06-05T14:23:06Z

Don't you think it could be done transitively for all tagged unions and integral enum types?

I have three optimizations on my mind:

use free patterns for zero-length variant representations in non-integral enums, including checks for nullable pointers and all of the above
allow #[repr(int type)] on non-integral enums
use #[packed] on enums to force all possible optimizations

I think the bit pattern of a variant should should be as close to 0 as possible in the order of declaration, just like in integral enums. It could stay undefined or become implementation defined

huonw · 2014-06-05T14:25:29Z

use #[packed] on enums to force all possible optimizations

Why wouldn't this be the default? (That is, why would all optimisations be applied by default.)

ghost · 2014-06-07T06:36:08Z

Don't you think it could be done transitively for all tagged unions and integral enum types?

I don't really understand the question. I'm sure that there are plenty of other possible optimizations, but they can be implemented irrespective of this proposed optimization and I think they should get their own dedicated github-issues.

This proposed optimization should happen automatically, just like the non-null based space-optimization of Option<&T> and Option<~T> values.

pczarn · 2014-06-13T10:05:19Z

@huonw, because it's a space-time tradeoff. Ideally, all possible values of an ADT could be represented within a minimal number of bits. However, values can be moved out of enums, so they should exist somewhere with proper alignment. Matching complex packed enums could still get expensive without simple discriminants:

enum Value {
  A = 20,
  B = 21,
}

// Option<Result<Value, IoErrorKind>>
match maybe_result {
  // matches on {22, _}
  None => a,
  // matches on {18, 0}
  Some(Err(ShortWrite(0))) => b,
  // matches on {x, _} if 20 <= x && x < 22
  Some(Ok(_)) => c,
}

ghost · 2014-10-01T13:51:23Z

@pczarn But there is no space-time tradeoff with the space-optimization that is being proposed here.

pczarn · 2014-10-01T14:17:41Z

Of course, sorry, I was referring to something else.

steveklabnik · 2016-05-24T21:04:39Z

Triage: no changes I'm aware of.

Mark-Simulacrum · 2017-05-19T16:32:42Z

I'm going to close this in favor of rust-lang/rfcs#1230 since there's a lot of potential layout optimizations involving enums but this one specific issue isn't key enough that it should be tracked separately.

Refactor type memory layouts and ABIs, to be more general and easier to optimize. To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties: * `Variants` describes the plurality of sum types (where applicable) * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s * `Tagged` has its variants discriminated by an integer tag, including C `enum`s * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants * `FieldPlacement` describes the number and memory offsets of fields (if any) * `Union` has all its fields at offset `0` * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type * `Arbitrary` records all the field offsets, which can be out-of-order * `Abi` describes how values of the type should be passed around, including for FFI * `Uninhabited` corresponds to no values, associated with unreachable control-flow * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component" * `ScalarPair` has two "scalar components", but only applies to the Rust ABI * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s Size optimizations implemented so far: * ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.: * `Option<!>` is 0 bytes * `Result<T, !>` has the same size as `T` * using arbitrary niches, not just `0`, to represent a data-less variant, e.g.: * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte * `Option<char>` is 4 bytes * using a range of niches to represent *multiple* data-less variants, e.g.: * `enum E { A(bool), B, C, D }` is 1 byte Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here. Closes #44426, fixes #5977, fixes #14540, fixes #43278.

…gle_brace, r=Veykril Fix allow extracting function from single brace of block expression Fix allow extracting function when selecting either `{` or `}` Fix rust-lang#14514

ghost mentioned this issue May 30, 2014

Space-optimize Option<T> for integral enum T rust-lang/rfcs#84

Closed

huonw added the I-slow label Jun 2, 2014

huonw mentioned this issue Jun 2, 2014

RFC: Add a partial_cmp method to PartialOrd rust-lang/rfcs#100

Merged

huonw mentioned this issue Jul 3, 2014

#[deriving(PartialOrd)] is O(N^2) code size for N enum variants #15375

Closed

huonw mentioned this issue Sep 9, 2014

Option<char> should be represented as just one 32 bit value #5977

Closed

huonw mentioned this issue Jan 11, 2015

Implement a discriminant_value intrinsic #20907

Closed

huonw mentioned this issue Aug 31, 2015

More Exotic Enum Layout Optimizations rust-lang/rfcs#1230

Closed

Mark-Simulacrum closed this as completed May 19, 2017

mrhota mentioned this issue Oct 13, 2017

Refactor type memory layouts and ABIs, to be more general and easier to optimize. #45225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Space-optimize `Option<T>` for integral enum `T` #14540

Space-optimize `Option<T>` for integral enum `T` #14540

ghost commented May 30, 2014

erickt commented Jun 3, 2014

ghost commented Jun 4, 2014 •

edited by ghost

Loading

pczarn commented Jun 5, 2014

huonw commented Jun 5, 2014

ghost commented Jun 7, 2014

pczarn commented Jun 13, 2014

ghost commented Oct 1, 2014

pczarn commented Oct 1, 2014

steveklabnik commented May 24, 2016

Mark-Simulacrum commented May 19, 2017

Space-optimize Option<T> for integral enum T #14540

Space-optimize Option<T> for integral enum T #14540

Comments

ghost commented May 30, 2014

erickt commented Jun 3, 2014

ghost commented Jun 4, 2014 • edited by ghost Loading

pczarn commented Jun 5, 2014

huonw commented Jun 5, 2014

ghost commented Jun 7, 2014

pczarn commented Jun 13, 2014

ghost commented Oct 1, 2014

pczarn commented Oct 1, 2014

steveklabnik commented May 24, 2016

Mark-Simulacrum commented May 19, 2017

Space-optimize `Option<T>` for integral enum `T` #14540

Space-optimize `Option<T>` for integral enum `T` #14540

ghost commented Jun 4, 2014 •

edited by ghost

Loading