Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UB at GDNative termination (seen during integration tests) #894

Closed
B-head opened this issue May 18, 2022 · 12 comments
Closed

UB at GDNative termination (seen during integration tests) #894

B-head opened this issue May 18, 2022 · 12 comments
Assignees
Milestone

Comments

@B-head
Copy link
Contributor

B-head commented May 18, 2022

Issue Codes:
https://github.com/godot-rust/godot-rust/blob/25603b047364cde1ab1267cb41579a2734ab01d5/gdnative-core/src/object/instance.rs#L104-L209

Reference Information:
#891 (comment)

@B-head B-head added the bug label May 18, 2022
@Bromeon
Copy link
Member

Bromeon commented May 18, 2022

Thanks for the report, and for narrowing it down!
From your linked comment:

This [memory leak] seems to have occurred at least three months ago in CI. https://github.com/godot-rust/godot-rust/runs/5280455202?check_suite_focus=true#step:3:1000

And I could not reproduce the bug in my environment. Instead, the external terminal causes the segmentation fault. 😵

... Undefined behavior! 🎲 (I was going to open the issue after I narrowed down the code that caused UB)

Can you describe the exact circumstances that caused this?

  • Godot version
  • godot-rust Git revision
  • which test caused the segmentation fault (if possible)
  • operating system and version

It's a real pity we can't use miri. FFI is a ticking time bomb.

@B-head
Copy link
Contributor Author

B-head commented May 18, 2022

  • Godot version
    • Godot_v3.4.3-stable_win64
  • godot-rust Git revision
    • All revisions I have worked on so far.
  • which test caused the segmentation fault (if possible)
    • At least, test_rust_class_construction().
  • operating system and version
    • Windows 10 Pro - 19044.1706
  • terminal
    • Git Bash in Git for Windows.

I am currently investigating the bug.

@B-head
Copy link
Contributor Author

B-head commented May 18, 2022

Fixed memory leak of NativeScript object.
Edit: After updating the Godot version to 3.4.4, the segmentation fault no longer occurs.

I think the UB occurred before here.

Ah... there's this.

@B-head B-head changed the title Instance::maybe_emplace() causes undefined behavior. Instance::maybe_emplace() causes memory leak. May 19, 2022
@B-head B-head changed the title Instance::maybe_emplace() causes memory leak. Instance::maybe_emplace() causes undefined behavior and memory leak. May 19, 2022
@Bromeon
Copy link
Member

Bromeon commented May 19, 2022

It could be quite hard to consistently reproduce this... Instance::try_emplace() is used by Instance::emplace(), and I've been using this in many applications without issues. Of course that doesn't mean there's no UB, but it means that we will need to investigate more closely under which circumstances this is caused.

Ideally we find some minimal code (simple godot-rust entry point + one #[derive(NativeClass)]) that causes the issue, then we can test it very easily.

We can also use CI, I'll gladly help a dedicated Windows config in GitHub action to run/test such a minimal example.

Ah... there's this.

What do you mean? This is a highly unsafe pointer cast for sure. But is it the line that caused the segfault for you?

@B-head
Copy link
Contributor Author

B-head commented May 19, 2022

What do you mean? This is a highly unsafe pointer cast for sure. But is it the line that caused the segfault for you?

I don't know, but that is the only thing I have not investigated yet.

What I have done so far:

Edit: I tried my best but it didn't work.
B-head@5e59f8a

@Bromeon
Copy link
Member

Bromeon commented Jul 17, 2022

I think this might not be related to maybe_emplace() or anything close, but to the termination of the GDNative module by Godot. A quick look with the debugger showed a segmentation fault in NativeScriptLanguage::_unload_stuff() of Godot's nativescript.cpp. This might be necessary to look into for #910.

For context:

While it wasn't a big issue in Godot 3.4, it seems to show up again for the Godot 3.5 RCs on Windows.

If someone has the time to confirm/reproduce or even help debugging this, that would be greatly appreciated.

@Bromeon Bromeon changed the title Instance::maybe_emplace() causes undefined behavior and memory leak. UB at GDNative termination (seen during integration tests) Jul 17, 2022
@Waridley
Copy link
Contributor

Waridley commented Aug 7, 2022

You could assign this to me if you want since I'm already familiar with the issue in 3.x. I should be able to look at it this week

@Bromeon
Copy link
Member

Bromeon commented Aug 7, 2022

@Waridley Thank you very much, that's greatly appreciated! ❤️

@Waridley
Copy link
Contributor

Not sure if it's related, but I'm getting a consistent segfault here: https://github.com/godot-rust/godot-rust/blob/1b4754ca073e97242d3cf8d26afd865af889a344/test/src/lib.rs#L227

It's like the GodotString created from "foo" is dropped before actually calling get_meta(). Changing it to this pinpoints the problem:

{
    let foo = Reference::new().into_shared();
    let foo = unsafe { foo.assume_safe() };
    foo.set_meta("foo", "bar");

    instance_id = foo.get_instance_id();

    assert!(unsafe { Node::try_from_instance_id(instance_id).is_none() });

    let reconstructed = unsafe { Reference::from_instance_id(instance_id) };
    
    let key2 = GodotString::from("foo");
    let val = reconstructed.get_meta(key2);
    assert_eq!(
        "bar",
        String::from_variant(&val).unwrap()
    );
    let _SEGFAULT = reconstructed.get_meta("foo");
}

@Waridley
Copy link
Contributor

Maybe the compiler is optimizing drop(arg0) to before the FFI call?

#[doc(hidden)]
#[inline(never)]
pub(crate) unsafe fn icallptr_var_str(
    method_bind: *mut sys::godot_method_bind,
    obj_ptr: *mut sys::godot_object,
    arg0: GodotString,
) -> sys::godot_variant {
    let gd_api = get_api();
    let mut argument_buffer: [*const libc::c_void; 1usize] = [arg0.sys() as *const _ as *const _];
    let mut ret = sys::godot_variant::default();
    let ret_ptr = (&mut ret) as *mut _;
    (gd_api.godot_method_bind_ptrcall)(
        method_bind,
        obj_ptr,
        argument_buffer.as_mut_ptr() as *mut _,
        ret_ptr as *mut _,
    );
    drop(arg0);
    ret
}

@Waridley
Copy link
Contributor

Maybe the compiler is optimizing drop(arg0) to before the FFI call?

Confirmed, using ManuallyDrop fixes it

bors bot added a commit that referenced this issue Aug 13, 2022
924: use ManuallyDrop in ptrcalls to prevent drop reordering r=Bromeon a=Waridley

On rustc 1.65.0-nightly (29e4a9ee0 2022-08-10), drops of args in some ptrcalls are seemingly being reordered to before the FFI call. Using `ManuallyDrop` prevents this.

This could potentially be related to #894, but I'm not confident enough to say it will definitely fix it yet.

Co-authored-by: Waridley <[email protected]>
@Bromeon
Copy link
Member

Bromeon commented Aug 25, 2022

Going to optimistically close this.
Should UB reappear, we can reopen.

@Bromeon Bromeon closed this as completed Aug 25, 2022
@Bromeon Bromeon added this to the v0.10.1 milestone Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants