Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for limited non-'static casts #6

Merged
merged 10 commits into from
Apr 20, 2022
Merged

Support for limited non-'static casts #6

merged 10 commits into from
Apr 20, 2022

Conversation

sagebind
Copy link
Owner

@sagebind sagebind commented Apr 9, 2022

Allow casting to specific lifetime-free types from non-'static generic sources in specific contexts.

  • Figure out how to get autoderef under control
  • Figure out API
  • Determine if and how users might unsafely mark additional types as being lifetime-free
    • Punt on this for now

@NobodyXu
Copy link
Contributor

NobodyXu commented Apr 10, 2022

Is it possible to cast a &T to &i32 using the new API?

@sagebind
Copy link
Owner Author

Not currently but it is theoretically possible to support, yes. I suspect it will be supported by the time I am able to release this.

@sagebind
Copy link
Owner Author

As far as API goes, I've figured out how to implement this automagically without any API changes. So cast!(value, u8) will just work even if value is not 'static.

src/lifetime_free.rs Show resolved Hide resolved
@NobodyXu
Copy link
Contributor

Determine if and how users might unsafely mark additional types as being lifetime-free

I think it might be useful, but maybe we can expose this functionality in another release?

@sagebind sagebind marked this pull request as ready for review April 20, 2022 23:05
@sagebind sagebind merged commit bd8ea00 into master Apr 20, 2022
@sagebind sagebind deleted the non-static branch April 20, 2022 23:10
/// Determine if two static, generic types are equal to each other.
#[inline(always)]
pub(crate) fn type_eq<T: 'static, U: 'static>() -> bool {
// Reduce the chance of `TypeId` collisions causing a problem by also

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeId is by definition globally unique. To be sure, collisions can currently occur, but I still don’t think these additional checks are warranted, given the extreme rarity of the issue, that it’s acknowledged to be a soundness hole in Rust, that a fix (more or less) is in the pipeline, and that the additional checks here still don’t catch everything.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, though none of these checks show up in a final binary since they are trivially optimized away so it doesn't really hurt. (As an aside, these checks were already present before this PR, they're just being rearranged here.)

// a function which references itself. This is something that LLVM cannot
// merge, since each monomorphized function has a reference to a different
// global alias.
type_id_of::<T>() == type_id_of::<U>()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suspect this invokes undefined behaviour that makes it unsound, and that it only works by accident at present.

It assumes that (a) unused function pointers are not optimised out of existence; and (b) monomorphisation doesn’t merge these functions. I’d be surprised if the compiler guaranteed either of these. They both really feel like places where it only works because we don’t have a sufficiently smart compiler yet. (I know that when llogiq made his crate momo there was widespread agreement that the compiler should demonomorphise lots of stuff.)

If the compiler does start merging these, then you have transmute in safe code, with the type_name check bearing all the load of memory safety, which is a worrying thought.

This ain’t my crate, and I have no plan on using it (just came across it incidentally), but I suggest reverting this.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding (a): The function pointers are not unused, since the function being invoked is itself the function that returns its own function pointer. When it is optimized away (at least in my inspections of what code is generated at various optimization levels) the expression itself is also optimized away, at which point the function pointers are no longer relevant (we're left with a folded constant true or false, which is all we really wanted). I have a hard time seeing the optimization step acting much differently for this code, though I suppose anything is possible.

Regarding (b): I walked through the source of LLVM's function merging implementation at several different versions to make my "hunch" a bit more objective, and at least today this is something that LLVM won't merge. However, you are right that it is possible that this will change in the future, and this says nothing of other rustc codegen backends or of optimization steps in rustc itself.

A better summary of the trick is: "These generic functions can't be merged unless they are already merged in order to have the same address space." My intuition could very well be wrong here, but it seems like merging such a function could actually result in incorrect observable behavior in some scenarios so it isn't clear to me that an optimization like this ever would be implemented intentionally, at least not without something smarter that understands the broader context of how the pointer is being used within the function.

To be sure, this is a hack, and not an ideal solution. Ideally, TypeId would have a way of being used without a 'static bound with the caveats that type comparisons ignore lifetimes, since we do our own independent checking of lifetime requirements with the LifetimeFree trait. However, it is unlikely that this will ever be supported since RFC 1849 was retracted.

This ain’t my crate, and I have no plan on using it (just came across it incidentally), but I suggest reverting this.

I did spend quite a bit of time weighing the pros and cons of this PR, but ultimately it did seem to me that at least currently the risk here is relatively low. Though to admit my bias, retracting this entire feature now, months after already being released, would be a pain.

@eddyb
Copy link

eddyb commented Jun 9, 2022

@sagebind replying out of line to #6 (comment) to avoid it being lost:

There is a way to compare types with lifetimes in it using TypeId, and it's really tricky to get lifetimes right (specifically types with more than one lifetime parameter - like, Foo<'a, 'static> and Foo<'static, 'a> must be detected as different) otherwise:

// This is emulating `type<'_>` HKTs and also could be simpler with GATs.
trait ApplyL<'a> {
    type Out;
}
trait LambdaL: for<'a> ApplyL<'a> {}
impl<X: for<'a> ApplyL<'a>> LambdaL for X {}
// `type<'_>` marker types (this can be automated with macros).
enum FooL1 {}
impl<'a> ApplyL<'a> for FooL1 {
    type Out = Foo<'a, 'static>;
}
enum FooL2 {}
impl<'a> ApplyL<'a> for FooL2 {
    type Out = Foo<'static, 'a>;
}
enum FooL1_2 {}
impl<'a> ApplyL<'a> for FooL1_2 {
    type Out = Foo<'a, 'a>;
}
// "Marked" type to use behind `dyn Any` to force downcasting to know
// the "marker" type `M`, even if it doesn't affect the data type.
struct Marked<M, T>(PhantomData<M>, T);

type MarkedForAnyL<X: LambdaL> = Marked<
    // Type that `TypeId::of` will effectively turn into a "structural" ID for
    // the shape of `for<'a> X<'a>` (i.e. multiple `ApplyL` implementors can
    // have the exact same `type Out = ...` and will be considered equal here).
    // The `&mut` allows `!Sized` `Out` types and reinforces invariant polymorphism.
    for<'a> fn(&'a mut <X as ApplyL<'a>>::Out),

    // "Storage" type that's `Any`-friendly (`'static` if `X` is).
    <X as ApplyL<'static>>::Out
>;

// Lifetime-invariant newtype of `dyn Any` (with custom ctors and downcasting).
pub struct AnyL<'a>(PhantomData<&'a mut &'a ()>, dyn Any);

impl<'a> AnyL<'a> {
    pub fn new_boxed<X: LambdaL + 'static>(
        x_a: <X as ApplyL<'a>>::Out,
    ) -> Box<AnyL<'a>> {
        // Erase `'a` (for `Any`-compatible storage).
        let x_static: <X as ApplyL<'static>>::Out
            = unsafe { transmute(x_a) };

        // Ensure that `for<'a> X<'a>`'s "shape" ends up in the `TypeId`.
        let marked_box: Box<MarkedForAnyL<X>>
            = Box::new(Marked(PhantomData, x_static));

        unsafe { transmute::<Box<dyn Any>, Box<AnyL<'a>>>(marked_box) }
    }

    pub fn downcast_ref<X: LambdaL + 'static>(&self) -> Option<&<X as ApplyL<'a>>::Out> {
        // Downcast to the type marked with `for<'a> X<'a>`'s "shape"
        let x_static: &<X as ApplyL<'static>>::Out
            = &self.1.downcast_ref::<MarkedForAnyL<X>>()?.1;

        // Reinject `'a` into the lifetime-erased "storage" type.
        Some(unsafe { transmute(x_static) })
    }
}

Sadly some of the shenanigans involved might still make the compiler struggle, and it's not going to be anywhere near pretty until we get GATs, but such an approach ensures FooL1, FooL2 and FooL1_2 result in distinct TypeIds, which would otherwise be impossible to distinguish (you'd have to either ban multiple-lifetime-position types or force the FooL1_2 version, and even then how do you guarantee it's used correctly?).


As for the function instantiation address taking thing: it simply looks broken/unsound - the only reason LLVM mergefunc would even consider them distinct is its implementation being too simple and nowhere near "graph isomorphism" (remember that we have unnamed_addr semantics for functions, which makes it valid to merge functions even if their addresses are observed).

cc @davidtwco can we make polymorphization treat T as unused in this snippet:

    fn unsound_type_id<T: ?Sized>() -> usize {
        unsound_type_id::<T> as usize
    }

It would be good to break anything like this before it gets used widely, as I can't find any theoretical justification for it (after all, a recursive self-call can't on its own be considered to use T, and observing the address must be strictly less meaningful than calling the function).

EDIT: I am being told by @lcnr (also working on polymorphization) that this will break at some point the future (assuming we get to turn that on by default), so this is not that theoretical.

Also, missed this the first time around:

This is something that LLVM cannot merge, since each monomorphized function has a reference to a different global alias.

At the risk of repeating myself, this isn't true, the "different global alias" can still have the same address thanks to unnamed_addr, and a more "bruteforce" search over the LLVM module should be able find the solution where merging the functions is possible, it would just be too expensive.

@NobodyXu
Copy link
Contributor

NobodyXu commented Jun 9, 2022

@eddyb Then we definitely need another way to calculate the type id.

As for the lifetime, this is actually a none-issue for castaway because it only supports casting to types obeys LifetimeFree:

  • without any lifetime annotation
  • has 'static lifetime
  • all fields of it also obeys LifetimeFree.

So the only question now is to find a new way of calculating typeid for any type.

@eddyb
Copy link

eddyb commented Jun 9, 2022

I don't understand LifetimeFree. When can T: LifetimeFree hold but T: 'static doesn't?

If a lifetime parameter exists but can be ignored (for whatever reason), you could have a trait that instead has an associated type Erased: 'static; and convert between the original and erased types.

(EDIT: @lcnr is explaining it to me, there was a thing I missed, but still it's very confusing how the important part isn't really explained, and the trait doesn't have the : 'static bound I would've expected)

@NobodyXu
Copy link
Contributor

NobodyXu commented Jun 9, 2022

I don't understand LifetimeFree. When can T: LifetimeFree hold but T: 'static doesn't?

Basically LifeTimeFree disallows struct S<'a> { PhantomData<&'a () } but allows struct S; to avoid any problem related to lifetime.

@eddyb
Copy link

eddyb commented Jun 9, 2022

I think part of the confusion here is that the trait and the cast are too far removed from the actual needs / reasoning behind this being possible at all.

IIUC, you want a "failible upcasting" of e.g. &T -> Option<&dyn Any>, which requires no bounds and succeeds when T happens to be a type that is "always 'static" pre-erasure, which AFAIK is described as "a type with no lifetime positions".

Or at least a version of TypeId::of that is -> Option<TypeId> with the same constraint.


In a different language, it could probably like if (T: 'static) { Some(TypeId::of::<T>()) } else { None }, but T: 'static here isn't really the same as the normal bound (it's more like a Never | Maybe | Always kind of thing, except Never doesn't make sense for T: 'static).


I can't think of any way to support this other than making it an official part of Any/TypeId.

However, there's an important distinction: this is still about T: 'static, just in a "worst-case specialization"/"dynamic" way, there's no way to use it to implement e.g. an unsound version of AnyL<'a>, as long as we add new functionality instead of relaxing the existing one.

So it might be way more amenable to getting added to Rust, compared to past proposals.

@NobodyXu
Copy link
Contributor

NobodyXu commented Jun 9, 2022

That's kind of what castaway is doing here:

We only permit upcasting to T: 'static, but we can accept any non-'static type.
Otherwise, cast! and match! in castaway would end up having 'static bound everywhere and it makes it kind of hard to use.

So it might be way more amenable to getting added to Rust, compared to past proposals.

A better way would be to stablise specialisation, this would remove all the hack used in castaway and also add the ability to specialise for types implementing specific traits, or forward &&T to &T.

@eddyb
Copy link

eddyb commented Jun 9, 2022

A better way would be to stablise specialisation

Specialization has no way to allow specializing specifically in the situations where TypeId::of_lifetimeless (to give a name to my suggestion above) would return Some, and allowing more than that is generally unsound.

E.g. impl<T> TypeEq<T> for T {} can't be allowed as a specialization because it would impose constraints on lifetimes if ever used and there's no built-in trait to limit TypeEq in a way that the language would understand as being sound (even then, it would likely have to be on both sides i.e. trait TypeEq<T: NoLifetimes>: NoLifetimes, which would be worse than 'static bounds).

(Or are you suggesting replacing all of the uses of castaway with specialization?)


Otherwise, cast! and match! in castaway would end up having 'static bound everywhere and it makes it kind of hard to use.

Just "hard" or "impossible"? If this used like specialization, isn't it impossible to add the 'static bound in the first place, to whatever is being specialized?

@NobodyXu
Copy link
Contributor

NobodyXu commented Jun 9, 2022

(Or are you suggesting replacing all of the uses of castaway with specialization?)

Yes, because castaway is essentially there to workaround lack of specialization.
If that get stablized, then we won't need this crate anymore.

Though it seems that the RFC is no where near stablization.

Just "hard" or "impossible"? If this used like specialization, isn't it impossible to add the 'static bound in the first place, to whatever is being specialized?

So castaway!($value, $type) basically returns a Ok if $value is of $type, or Err.

Without this PR, the type of $value must be 'static, making it impossible to use in many cases, including ToCompactString which I worked on and added specialization to it using castaway.

@eddyb
Copy link

eddyb commented Jul 24, 2022

Looks like @dtolnay is taking an interest in this kind of approach, might want to keep an eye on that in case it turns into an RFC (or join the discussion etc.).

sagebind pushed a commit that referenced this pull request Sep 19, 2023
According to #6 (comment) in reference to the `type_id_of::<T> as usize`-based implementation:

> As for the function instantiation address taking thing: it simply looks broken/unsound - the only reason LLVM mergefunc would even consider them distinct is its implementation being too simple and nowhere near "graph isomorphism" (remember that we have `unnamed_addr` semantics for functions, which makes it valid to merge functions _even if their addresses are observed_).

> I am being told by lcnr (also working on polymorphization) that this will break at some point the future (assuming we get to turn that on by default), so this is not that theoretical.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants