Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit binds in virtual calls cause borrow errors #338

Closed
lilizoey opened this issue Jul 11, 2023 · 18 comments · Fixed by #501
Closed

Implicit binds in virtual calls cause borrow errors #338

lilizoey opened this issue Jul 11, 2023 · 18 comments · Fixed by #501
Labels
bug c: engine Godot classes (nodes, resources, ...)

Comments

@lilizoey
Copy link
Member

lilizoey commented Jul 11, 2023

Edit bromeon -- title was "Cannot add_child in a &mut self function if Self overrides on_notification"

#[derive(GodotClass)]
#[class(init, base = Node)]
pub struct Bar {
    #[base]
    base: Base<Node>,
}

#[godot_api]
impl NodeVirtual for Bar {
    fn ready(&mut self) {
        if Engine::singleton().is_editor_hint() {
            return;
        }

        let foo = Node::new_alloc();
        self.add_child(foo.share().upcast());
    }

    fn on_notification(&mut self, what: NodeNotification) {}
}

add_child will trigger a notification (https://docs.godotengine.org/en/stable/classes/class_node.html#class-node-constant-notification-child-order-changed), and that will in turn trigger a call to on_notification, which will try to get a mutable borrow of self. This causes a crash.

It seems like this is a big issue tbh because it basically makes self.add_child almost unusable whenever on_notification is overridden.

I'm not sure what a good solution to this is, but it's related to the common issue of calling a signal that will trigger on self.

@lilizoey lilizoey added bug c: engine Godot classes (nodes, resources, ...) labels Jul 11, 2023
@Bromeon
Copy link
Member

Bromeon commented Jul 11, 2023

This causes a crash.

I guess you mean panic?

I'm not sure what a good solution to this is, but it's related to the common issue of calling a signal that will trigger on self.

Yep, exactly. It's very hard to know if the on_notification call comes "from Rust". I was thinking about marking certain re-entrant calls as safe, e.g. set a flag during an outbound Rust -> Godot call and allow reborrows.

But the big problem is that we lose information about local borrows. How would we prevent scenarios like this?

#[derive(GodotClass)]
#[class(init, base = Node)]
pub struct Bar {
    #[base]
    base: Base<Node>,

    vec: Vec<i32>,
}

#[godot_api]
impl NodeVirtual for Bar {
    fn ready(&mut self) {
        self.vec = vec![1, 2, 3];

        for _ in self.vec.iter() {
            self.base.add_child(Node::new_alloc());
        }
    }

    fn on_notification(&mut self, what: NodeNotification) {
        self.vec.clear(); // unsound
    }
}

Maybe it needs a completely different design, where such methods don't implicitly borrow self, but only Gd<Self>. They become much less useful, because one would then need a 3rd object (ouside of self) to track state changes...

The same problem applies to signals. I don't see how we can allow safe reborrows. And requiring unsafe for simple things like this is also quite ugly.

@Bromeon
Copy link
Member

Bromeon commented Jul 11, 2023

One spontaneous idea: explicit partial borrows.

#[derive(GodotClass)]
#[class(init, base = Node)]
pub struct Bar {
    #[base]
    base: Base<Node>,

    // Let's say we have 2 data fields:
    vec: Vec<i32>,
    notify_count: usize,
}

And now, the user would need to explicitly request access to certain fields:

#[godot_api]
impl NodeVirtual for Bar {
    fn ready(&mut self) {
        self.vec = vec![1, 2, 3];

        // Tell gdext that we're only accessing these two fields
        // (it should limit the scope if possible, e.g. through a local function).
        // Closure syntax is just an example for familiarity; up to bikeshedding and technical limits.
        partial_borrow!(self => |vec, mut base| {
            for _ in vec.iter() {
                base.add_child(Node::new_alloc());
            }
        });
    }

    fn on_notification(this: Gd<Self>, what: NodeNotification) {
        // This would panic:
        partial_borrow!(this => |mut vec| {
            vec.clear();
        });

        // However this would work:
        partial_borrow!(this => |mut notify_count| {
            *notify_count += 1;
        });
    }
}

It's of course clumsy, but Rust itself is very pedantic about not calling &mut methods while there are any borrows. I'd argue that's one of the trade-offs that someone accepts when using Rust, in return for explicit control and avoiding "clear while iterate" style bugs.

@kulkalkul
Copy link
Contributor

This is not related to the issue title, but commenting here as it is relevant to the rest of the comments.

The same problem applies to signals. I don't see how we can allow safe reborrows.

Same is also true for rpcs. Using rpcs right now infest the rust with "string method" calls. I tried limiting rpc calls like so:

    pub fn set_tool(&mut self, mut item: Item) {
        self.rpc("r_set_tool".into(), &[item.to_variant()]);
    }

But it doesn't work for the same reason. Only workaround that user can do is either using self.rpc("r_set_tool".into(), &[item.to_variant()]); directly or making it deferred. But I don't think either are good ways to solve this problem. I have no suggestion on the solution tho.

@gg-yb
Copy link
Contributor

gg-yb commented Jul 24, 2023

Also one important thing to note is that deferring is not really a workaround here: It explicitly changes behavior by introducing delays. For some applications that is unacceptable

@gg-yb
Copy link
Contributor

gg-yb commented Jul 24, 2023

One spontaneous idea: explicit partial borrows.

#[derive(GodotClass)]
#[class(init, base = Node)]
pub struct Bar {
    #[base]
    base: Base<Node>,

    // Let's say we have 2 data fields:
    vec: Vec<i32>,
    notify_count: usize,
}

And now, the user would need to explicitly request access to certain fields:

#[godot_api]
impl NodeVirtual for Bar {
    fn ready(&mut self) {
        self.vec = vec![1, 2, 3];

        // Tell gdext that we're only accessing these two fields
        // (it should limit the scope if possible, e.g. through a local function).
        // Closure syntax is just an example for familiarity; up to bikeshedding and technical limits.
        partial_borrow!(self => |vec, mut base| {
            for _ in vec.iter() {
                base.add_child(Node::new_alloc());
            }
        });
    }

    fn on_notification(this: Gd<Self>, what: NodeNotification) {
        // This would panic:
        partial_borrow!(this => |mut vec| {
            vec.clear();
        });

        // However this would work:
        partial_borrow!(this => |mut notify_count| {
            *notify_count += 1;
        });
    }
}

It's of course clumsy, but Rust itself is very pedantic about not calling &mut methods while there are any borrows. I'd argue that's one of the trade-offs that someone accepts when using Rust, in return for explicit control and avoiding "clear while iterate" style bugs.

Another option could be to only ever use a &self receiver, and force the user to do all state keeping using interior mutability e.g. through (Ref)Cell or Mutex. Not sure how that would play together with exports though.

@Bromeon
Copy link
Member

Bromeon commented Aug 15, 2023

@Soremwar mentioned a closely related issue in #383, which is a more general case of this: calling Rust -> GDScript -> Rust, which causes double borrows. That one is probably even harder to solve than on_notification, since we'd need a way to trace the borrow across FFI and back, with arbitrary GDScript code in the middle.

@lilizoey
Copy link
Member Author

Allowing Gd<Self> #359 as the receiver could be used to do a workaround for #383, though it'd be kinda awkward code imo:

#[func]
fn foo(_self: Gd<Self>) {
  {
    let _self = _self.bind_mut();
    // do stuff with self
  }
  // some function call that would potentially grab a mutable reference to self
  _self.call("some_func", ...);
  {
    let _self = _self.bind_mut();
    // do other stuff with self
  }
}

@bluenote10
Copy link
Contributor

This is an interesting challenge, and probably far more general than the specific example in the issue title. Working with notifications or signals in general has the potential to "backfire", leading to a panic.

Adding another data point: I encountered this issue while implementing a custom item list, which basically started as a port of item_list.cpp of Godot. The pattern used in that implementation doesn't work out-of-the box in Rust, because it involves backfiring signals, even in a relatively subtle way:

  • The item list uses a scroll back, and its "value_changed" signal gets connected to a callback that simply does queue_redraw. That makes sense, in general the item list wants to react to scroll events.
  • The main layouting actually happens inside NOTIFICATION_DRAW.
  • Under certain conditions this drawing logic actually interacts with the scroll bar, calling its set_max and set_page functions. What is not immediately obvious: Under certain conditions these setter actually internally delegate to the set_value setter, which in turn triggers the signal. In Rust this results in a panic, because the callback cannot be called while the drawing is happening. Interestingly there is a set_value_no_signal setter, that allows to set the scrollbar value without generating a signal. However, there are no set_max_no_signal or set_page_no_signal variations, which makes it tricky to realize that the code can backfire a signal. In fact it took me a while to encounter the bug in practice, because it requires the special condition of the scroll bar being scrolled all the way to the bottom, followed by resizing the window in a way that the height increases.
  • The work-around I'm using in this particular example is to temporarily disconnect the signal during the drawing logic that messes with the scrollbar and re-connect the signal after the interaction is done. It actually turns out that doing a queue_redraw is pointless in this case when the scrollbar changes its value triggered by the drawing logic itself, because the drawing is already happening. Triggering yet another redraw is actually a waste of resources. So while the bug was not so easy to find, it has actually revealed a code smell of the C++ version, and the resolution doesn't even feel so bad.

Some thoughts about more general solutions:

The idea in #359 looks quite helpful. For instance, instead of disconnecting and reconnecting the signal, the drawing logic could have set self.is_within_redraw = true (and at the end back to false)

#[func]
fn on_scroll_changed(_self: Gd<Self>) {
    // Only try to mutable bind & trigger a redraw if we are not already drawing...
    if self.is_within_redraw {
        let _self = self.bind_mut();
        _self.base.queue_redraw()
    }
}

But the manual maintenance of self.is_within_redraw feels a bit clumsy. This makes me wonder if it would be possible to offer a kind of try_bind_mut() which basically knows whether we are already bound or not? This would allow to write:

#[func]
fn on_scroll_changed(_self: Gd<Self>) {
    if let Some(_self) = _self.try_bind_mut() {
        // No other mut binding exists, so we know we have been triggered from "externally".
        // In this case we want to trigger a redraw.
        _self.base.queue_redraw()
    } else {
        // Apparently we cannot get a mut binding. This indicates that we have triggered ourselves!
        // And assuming this can only come from the drawing/layouting logic itself, we actually don't
        // have to do anything here in this particular example...
    }
}

The interesting question is what to do in the else branch in the general case. One possibility would be to use interior mutability to encode "what I would have liked to do, but couldn't". At the end of the other methods we could check whether such actions exist and execute them there. I'm not sure if it is possible/sensible to generalize this, but a crazy idea was to have a kind of DeferredQueue that uses interior mutability like this:

#[func]
fn on_scroll_changed(&self) { // due to interior mutability we don't need &mut
    // The idea of `run_or_enqueue` would be to either execute the code block synchronously
    // if a mutable binding is possible, or to push the FnOnce into a queue with interior mutability.
    // If desirable, a plain `.enqueue()` that is consistently lazy could be offered a well.
    self.deferred_queue.run_or_enqueue(|&mut this| {
        // here we have a &mut binding
    });
}

#[func]
fn process(&mut self, delta: f64) {
    // This executes all the accumulated actions. Since the `process` function runs once per frame, the amount
    // of delay can be okay-ish. To reduce the delay the user could also place the `run_all()` strategically in
    // certain other methods.
    self.deferred_queue.run_all();
}

Not sure if this makes sense, but I feel that offering a simple way to "defer" signals/notifications could help a lot to mitigate this issue, because it breaks the cyclic nature of the problem. In many use cases the slight delay may be a reasonable price to pay for being guaranteed not to run into "signal cycle panics".

@lilizoey
Copy link
Member Author

lilizoey commented Sep 2, 2023

But the manual maintenance of self.is_within_redraw feels a bit clumsy. This makes me wonder if it would be possible to offer a kind of try_bind_mut() which basically knows whether we are already bound or not? This would allow to write:

#[func]
fn on_scroll_changed(_self: Gd<Self>) {
    if let Some(_self) = _self.try_bind_mut() {
        // No other mut binding exists, so we know we have been triggered from "externally".
        // In this case we want to trigger a redraw.
        _self.base.queue_redraw()
    } else {
        // Apparently we cannot get a mut binding. This indicates that we have triggered ourselves!
        // And assuming this can only come from the drawing/layouting logic itself, we actually don't
        // have to do anything here in this particular example...
    }
}

The interesting question is what to do in the else branch in the general case. One possibility would be to use interior mutability to encode "what I would have liked to do, but couldn't". At the end of the other methods we could check whether such actions exist and execute them there. I'm not sure if it is possible/sensible to generalize this, but a crazy idea was to have a kind of DeferredQueue that uses interior mutability like this:

In this particular case, you could just do

#[func]
fn on_scroll_changed(_self: Gd<Self>) {
    _self.upcast().queue_redraw()
}

and in general i think the solution is gonna have to be that people need to be a bit more fine-grained with when they do bind_mut(). like do you really need to lock the entire struct for this? can you just lock the particular field you need instead? Like using shared references more with interior mutability. Or just upcasting the Gd if you only need to call something on the godot code.

@Bromeon
Copy link
Member

Bromeon commented Sep 2, 2023

upcast() is no longer needed; since #370 you can call Godot functions on the Gd<T> directly.

@mhoff12358
Copy link
Contributor

Working on #396, the constant across calling a method with either &self or this: Gd<Self> as the self-reference is that what's actually held under the hood is an InstanceStorage<T>.

So maybe we want the user to write something like:

#[godot_api]
impl SomethingVirtual for MyType {
  fn method(&self) {}
}

Then internally when registering method with godot, the implementation is actually something like fn method_impl(storage: &InstanceStorage<MyType>), and it'll be generated at compile time to either get a &MyType or a Gd<MyType>.

We're not doing any runtime branching, and you can still call method from rust with whichever argument you used.

The one issue is that method_impl can't actually take InstanceStorage<MyType> as its argument, it'd need to take some kind of generic InstanceStorage and downcast the pointers. Which should be possible, but I think would require writing InstanceStorage to be able to have that generic implementation.

@mhoff12358
Copy link
Contributor

Actually, hold on, InstanceStorage<T> doesn't have a Box<T> it has a RefCell<T>.
So you can't generically find the &T inside any given one. Unless we did some ugly unsafe code where the fixed size data was always stored first and the T was always stored at the end of the struct, but that'd be ugly.

The cleaner option would be to have a generic InstanceStorageTrait and get the &T to downcast through a virtual method, with method_impl taking a &dyn InstanceStorageTrait instead. That adds an extra virtual method call to all our trait implementation method calls, but it's already a virtual call so maybe it's worth it?

@Bromeon
Copy link
Member

Bromeon commented Sep 18, 2023

Working on #396, the constant across calling a method with either &self or this: Gd<Self> as the self-reference is that what's actually held under the hood is an InstanceStorage<T>.

Is the common denominator not simply Gd<Self>? You can always get a &self out of it using bind(), and &mut self using bind_mut().

I don't see why you need InstanceStorage. What problem do you want to solve?

@mhoff12358
Copy link
Contributor

I'm assuming InstanceStorage::get's Rc::try_borrow is cheaper than cloning a Gd, and that we don't want to copy a smart pointer whenever a virtual method is called if we don't have to.
If the cost of cloning the Gd is cheap enough to do for every virtual call, then yes it would be much better to use that instead of the InstanceStorage.

@Bromeon
Copy link
Member

Bromeon commented Sep 28, 2023

Fair point. But is dyn needed here? The proc-macro could probably just call a different method in InstanceStorage, depending on whether a Gd<T> or a &T/&mut T is needed.

@mhoff12358
Copy link
Contributor

mhoff12358 commented Oct 1, 2023

If I understand correctly, we currently have (with other methods left out):

pub trait NodeVirtual {
  fn on_notification(&mut self, what: NodeNotification) {
    // Some default implementation
  }
}

We can't do something like:

pub trait NodeVirtual {
  fn on_notification(storage: InstanceStorage<Node>, what: NodeNotification);
}

because in whatever code we generate in impl NodeVirtual for MyNodeSubclass there's no way to downcast InstanceStorage<Node> to InstanceStorage<MyNodeSubclass>. Or for that matter upcast InstanceStorage<MyNodeSubclass> to InstanceStorage<Node>.

That said, it sounds like it's efficient enough to do:

pub trait NodeVirtual {
  fn on_notification(node: Gd<Node>, what: NodeNotification) {
    // Some default implementation
  }
}

And then users either implement on_notification directly, or we use a proc_macro to turn

impl NodeVirtual for MyNodeSubclass {
  fn on_notification(&mut self, what: NodeNotification) {
    // User code
  }
}

into

impl NodeVirtual for MyNodeSubclass {
  fn on_notification(node: Gd<Node>, what: NodeNotification) {
    let this: Gd<MyNodeSubclass> = node.cast();
    this.bind_mut().on_notification_impl(what);
  }
}

impl MyNodeSubclass {
  fn on_notification_impl(&mut self, what: NodeNotification) {
    // User code
  }
}

@Bromeon Bromeon changed the title Cannot add_child in a &mut self function if Self overrides on_notification Implicit binds in virtual calls cause borrow errors Oct 25, 2023
@lilizoey
Copy link
Member Author

lilizoey commented Nov 16, 2023

I have an idea for how to mostly fix these issues.

Background

The basic issue is that we have an indirect call of a rust method through ffi, but this shouldn't theoretically be an issue as you are allowed to do this in rust.

For instance, let's say you have a signal foo, which when emitted calls bar. Then this code:

fn process(&mut self, delta: f64) {
    self.base.emit("foo")
}

fn bar(&mut self) {
  // code
}

Is the same as doing

fn process(&mut self, delta: f64) {
    self.bar()
}

fn bar(&mut self) {
  // code
}

which is fine, but the original code would fail.

The reason this is fine is because rust knows that calling bar() requires a mutable reference to self, i.e there must be no aliasing other references around.

So if we can somehow create an API that allows a user to call methods which may potentially call back into self (such as emit or add_child) only when we could get a &mut self reference. Then it would be safe to create a new &mut self reference when we're called from Godot later.

Basic Idea

The idea has two major components (all names used are up for bikeshedding):

  1. We add a new method that can get a mutable reference to a rust class, which i will call bind_self(), only when the object hasn't been bound by either bind() or bind_mut(), it also blocks both bind() and bind_mut(). 1
  2. We create a mechanism to detect when an aliasing reference may exist. And use bind_mut() if one may exist, and bind_self() when we know one cant.

Current Idea For Implementation

My current idea for implementing this is:

We add a counter to InstanceStorage that i'll call possibly_aliasing: usize2 which tracks the number of possibly aliasing references that exist. This counter is set to 0 by default.

possibly_aliasing will be incremented by 1 whenever we enter a method from Godot, or whenever bind() or bind_mut() are called on a Gd<T>. As we can then get arbitrary aliasing references.

possibly_aliasing is decremented by 1 whenever one of the bind-guards are dropped.

Additionally we create a drop-guard which Base can return. This drop-guard decrements possibly_aliasing by 1, and then increments it by 1 when it's dropped. This drop-guard requires a &mut self to be created, ensuring that we do not have any other aliasing references around when it's called.3 This drop-guard dereferences to the engine-type similar to a Gd<T>.

When a rust-method is called from Godot (that takes &mut self), we check possibly_aliasing. If it is 0, then we use bind_self(), otherwise we use bind_mut().

So as some code-examples we'd take this:

#[derive(GodotClass)]
#[class(base = Node, init)]
struct MyClass {
  #[base]
  base: Base<Node>
}

#[godot_api]
impl INode for MyClass {
  fn ready(&mut self) {
    // base() here returns the drop guard 
    self.base().call("some_func".into(), &[]);
  }
}

#[godot_api]
impl MyClass {
  #[func]
  fn some_func(&mut self) {
    // code
  }
}

And we generate this (very abbreviated) glue-code:

fn generated_ready(storage: &InstanceStorage<MyClass>) {
  if storage.get_possibly_aliased() == 0 {
    let mut instance = unsafe { storage.bind_self() };
    instance.ready()
  } else {
    let mut instance = storage.bind_mut();
    instance.ready()
  }
}

fn generated_some_func(storage: &InstanceStorage<MyClass>) {
  if storage.get_possibly_aliased() == 0 {
    let mut instance = unsafe { storage.bind_self() };
    instance.some_func()
  } else {
    let mut instance = storage.bind_mut();
    instance.some_func()
  }
}

One thing to consider is adding an immutable/mutable pair of these, as it'd be safe to call a method that takes a &self from one that takes a &mut self. However the other way is not safe. This could be done with a second counter that just tracks how many immutable self-binds we've taken. And if it's greater than 0 then a mutable self bind will fail, but this should also allow bind() to succeed.

old contents i wonder if we can add another kind of unsafe bind, like `bind_self`. which returns a `GdSelf` that derefs to `&mut self`. where the safety invariant is that it is only used to call a method on it.

it fails if there exists a bind or bind_mut (and also blocks both of those calls), but not if there already exists a bind_self. that way we can allow for reentrancy without borrow failures. and i dont think it'd be possible to get two aliasing mut pointers, since it cant exist alongside the other binds and i dont think you can ever observe two GdSelf at the same time.

this needs one modification, accessing base must require a &mut self reference. we can do this by changing how base is accessed to a method on self. so we'd have self.base().add_child() for instance. that way rust can understand that any usage of base could potentially require a mutable reference to self. this could be done by making dereferencing a Base an unsafe method and auto-generating the accessor method.

This should avoid the unsoundness with allowing reentrancy @Bromeon identified above.

This is additionally also semantically more correct imo, since a Base is a pointer to self, so it makes sense that calling a method on it should require an appropriate reference to self.

Footnotes

  1. I am unsure how user-facing bind_self() should be. We may choose to only implement it on InstanceStorage thus making it entirely internal and hidden from the user.
    bind_self() would likely need to be unsafe also as it can return multiple &mut T references to the same object.

  2. This will actually be a RefCell<usize> or AtomicUsize as relevant.

  3. We might be able to make this a safe function, if we check that the &mut self points to the same rust object that the Base refers to.

@lilizoey
Copy link
Member Author

My rough idea from above, a bit modified, has been implemented in #501. I will write a more detailed description in the PR of how it works before marking the PR as ready for review. I could also split some stuff out of that PR to make it more manageable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c: engine Godot classes (nodes, resources, ...)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants