-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for limited non-'static
casts
#6
Conversation
Work in progress.
Is it possible to cast a |
Not currently but it is theoretically possible to support, yes. I suspect it will be supported by the time I am able to release this. |
As far as API goes, I've figured out how to implement this automagically without any API changes. So |
I think it might be useful, but maybe we can expose this functionality in another release? |
/// Determine if two static, generic types are equal to each other. | ||
#[inline(always)] | ||
pub(crate) fn type_eq<T: 'static, U: 'static>() -> bool { | ||
// Reduce the chance of `TypeId` collisions causing a problem by also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeId
is by definition globally unique. To be sure, collisions can currently occur, but I still don’t think these additional checks are warranted, given the extreme rarity of the issue, that it’s acknowledged to be a soundness hole in Rust, that a fix (more or less) is in the pipeline, and that the additional checks here still don’t catch everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, though none of these checks show up in a final binary since they are trivially optimized away so it doesn't really hurt. (As an aside, these checks were already present before this PR, they're just being rearranged here.)
// a function which references itself. This is something that LLVM cannot | ||
// merge, since each monomorphized function has a reference to a different | ||
// global alias. | ||
type_id_of::<T>() == type_id_of::<U>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly suspect this invokes undefined behaviour that makes it unsound, and that it only works by accident at present.
It assumes that (a) unused function pointers are not optimised out of existence; and (b) monomorphisation doesn’t merge these functions. I’d be surprised if the compiler guaranteed either of these. They both really feel like places where it only works because we don’t have a sufficiently smart compiler yet. (I know that when llogiq made his crate momo there was widespread agreement that the compiler should demonomorphise lots of stuff.)
If the compiler does start merging these, then you have transmute in safe code, with the type_name check bearing all the load of memory safety, which is a worrying thought.
This ain’t my crate, and I have no plan on using it (just came across it incidentally), but I suggest reverting this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding (a): The function pointers are not unused, since the function being invoked is itself the function that returns its own function pointer. When it is optimized away (at least in my inspections of what code is generated at various optimization levels) the expression itself is also optimized away, at which point the function pointers are no longer relevant (we're left with a folded constant true
or false
, which is all we really wanted). I have a hard time seeing the optimization step acting much differently for this code, though I suppose anything is possible.
Regarding (b): I walked through the source of LLVM's function merging implementation at several different versions to make my "hunch" a bit more objective, and at least today this is something that LLVM won't merge. However, you are right that it is possible that this will change in the future, and this says nothing of other rustc codegen backends or of optimization steps in rustc itself.
A better summary of the trick is: "These generic functions can't be merged unless they are already merged in order to have the same address space." My intuition could very well be wrong here, but it seems like merging such a function could actually result in incorrect observable behavior in some scenarios so it isn't clear to me that an optimization like this ever would be implemented intentionally, at least not without something smarter that understands the broader context of how the pointer is being used within the function.
To be sure, this is a hack, and not an ideal solution. Ideally, TypeId
would have a way of being used without a 'static
bound with the caveats that type comparisons ignore lifetimes, since we do our own independent checking of lifetime requirements with the LifetimeFree
trait. However, it is unlikely that this will ever be supported since RFC 1849 was retracted.
This ain’t my crate, and I have no plan on using it (just came across it incidentally), but I suggest reverting this.
I did spend quite a bit of time weighing the pros and cons of this PR, but ultimately it did seem to me that at least currently the risk here is relatively low. Though to admit my bias, retracting this entire feature now, months after already being released, would be a pain.
@sagebind replying out of line to #6 (comment) to avoid it being lost: There is a way to compare types with lifetimes in it using // This is emulating `type<'_>` HKTs and also could be simpler with GATs.
trait ApplyL<'a> {
type Out;
}
trait LambdaL: for<'a> ApplyL<'a> {}
impl<X: for<'a> ApplyL<'a>> LambdaL for X {} // `type<'_>` marker types (this can be automated with macros).
enum FooL1 {}
impl<'a> ApplyL<'a> for FooL1 {
type Out = Foo<'a, 'static>;
}
enum FooL2 {}
impl<'a> ApplyL<'a> for FooL2 {
type Out = Foo<'static, 'a>;
}
enum FooL1_2 {}
impl<'a> ApplyL<'a> for FooL1_2 {
type Out = Foo<'a, 'a>;
} // "Marked" type to use behind `dyn Any` to force downcasting to know
// the "marker" type `M`, even if it doesn't affect the data type.
struct Marked<M, T>(PhantomData<M>, T);
type MarkedForAnyL<X: LambdaL> = Marked<
// Type that `TypeId::of` will effectively turn into a "structural" ID for
// the shape of `for<'a> X<'a>` (i.e. multiple `ApplyL` implementors can
// have the exact same `type Out = ...` and will be considered equal here).
// The `&mut` allows `!Sized` `Out` types and reinforces invariant polymorphism.
for<'a> fn(&'a mut <X as ApplyL<'a>>::Out),
// "Storage" type that's `Any`-friendly (`'static` if `X` is).
<X as ApplyL<'static>>::Out
>;
// Lifetime-invariant newtype of `dyn Any` (with custom ctors and downcasting).
pub struct AnyL<'a>(PhantomData<&'a mut &'a ()>, dyn Any);
impl<'a> AnyL<'a> {
pub fn new_boxed<X: LambdaL + 'static>(
x_a: <X as ApplyL<'a>>::Out,
) -> Box<AnyL<'a>> {
// Erase `'a` (for `Any`-compatible storage).
let x_static: <X as ApplyL<'static>>::Out
= unsafe { transmute(x_a) };
// Ensure that `for<'a> X<'a>`'s "shape" ends up in the `TypeId`.
let marked_box: Box<MarkedForAnyL<X>>
= Box::new(Marked(PhantomData, x_static));
unsafe { transmute::<Box<dyn Any>, Box<AnyL<'a>>>(marked_box) }
}
pub fn downcast_ref<X: LambdaL + 'static>(&self) -> Option<&<X as ApplyL<'a>>::Out> {
// Downcast to the type marked with `for<'a> X<'a>`'s "shape"
let x_static: &<X as ApplyL<'static>>::Out
= &self.1.downcast_ref::<MarkedForAnyL<X>>()?.1;
// Reinject `'a` into the lifetime-erased "storage" type.
Some(unsafe { transmute(x_static) })
}
} Sadly some of the shenanigans involved might still make the compiler struggle, and it's not going to be anywhere near pretty until we get GATs, but such an approach ensures As for the function instantiation address taking thing: it simply looks broken/unsound - the only reason LLVM mergefunc would even consider them distinct is its implementation being too simple and nowhere near "graph isomorphism" (remember that we have cc @davidtwco can we make polymorphization treat fn unsound_type_id<T: ?Sized>() -> usize {
unsound_type_id::<T> as usize
} It would be good to break anything like this before it gets used widely, as I can't find any theoretical justification for it (after all, a recursive self-call can't on its own be considered to use EDIT: I am being told by @lcnr (also working on polymorphization) that this will break at some point the future (assuming we get to turn that on by default), so this is not that theoretical. Also, missed this the first time around:
At the risk of repeating myself, this isn't true, the "different global alias" can still have the same address thanks to |
@eddyb Then we definitely need another way to calculate the type id. As for the lifetime, this is actually a none-issue for
So the only question now is to find a new way of calculating |
I don't understand If a lifetime parameter exists but can be ignored (for whatever reason), you could have a trait that instead has an associated (EDIT: @lcnr is explaining it to me, there was a thing I missed, but still it's very confusing how the important part isn't really explained, and the trait doesn't have the |
Basically |
I think part of the confusion here is that the trait and the cast are too far removed from the actual needs / reasoning behind this being possible at all. IIUC, you want a "failible upcasting" of e.g. Or at least a version of In a different language, it could probably like I can't think of any way to support this other than making it an official part of However, there's an important distinction: this is still about So it might be way more amenable to getting added to Rust, compared to past proposals. |
That's kind of what We only permit upcasting to
A better way would be to stablise specialisation, this would remove all the hack used in |
Specialization has no way to allow specializing specifically in the situations where E.g. (Or are you suggesting replacing all of the uses of
Just "hard" or "impossible"? If this used like specialization, isn't it impossible to add the |
Yes, because Though it seems that the RFC is no where near stablization.
So Without this PR, the type of |
Looks like @dtolnay is taking an interest in this kind of approach, might want to keep an eye on that in case it turns into an RFC (or join the discussion etc.). |
According to #6 (comment) in reference to the `type_id_of::<T> as usize`-based implementation: > As for the function instantiation address taking thing: it simply looks broken/unsound - the only reason LLVM mergefunc would even consider them distinct is its implementation being too simple and nowhere near "graph isomorphism" (remember that we have `unnamed_addr` semantics for functions, which makes it valid to merge functions _even if their addresses are observed_). > I am being told by lcnr (also working on polymorphization) that this will break at some point the future (assuming we get to turn that on by default), so this is not that theoretical.
Allow casting to specific lifetime-free types from non-
'static
generic sources in specific contexts.