Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arc::drop has a (potentially) dangling shared ref #55005

Closed
Tracked by #10
RalfJung opened this issue Oct 12, 2018 · 65 comments
Closed
Tracked by #10

Arc::drop has a (potentially) dangling shared ref #55005

RalfJung opened this issue Oct 12, 2018 · 65 comments
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. WG-embedded Working group: Embedded systems

Comments

@RalfJung
Copy link
Member

RalfJung commented Oct 12, 2018

Discovered by @Amanieu on IRLO. Quoting their report:

Arc::drop contains this code:

if self.inner().strong.fetch_sub(1, Release) != 1 {
    return;
}

Once the current thread (Thread A) has decremented the reference count, Thread B could come in and free the ArcInner.

The problem becomes apparent when you look at the implementation of fetch_sub:

pub fn fetch_sub(&self, val: $int_type, order: Ordering) -> $int_type {
    unsafe { atomic_sub(self.v.get(), val, order) }
    // HERE
}

Note the point marked HERE: at this point we have released our claim to the Arc (as in, decremented the count), which means that Thread B might have freed the ArcInner. However the &self still points to the strong reference count in the ArcInner -- so &self dangles.

Other instances of this:

@RalfJung RalfJung added P-medium Medium priority I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Oct 12, 2018
@RalfJung
Copy link
Member Author

RalfJung commented Oct 12, 2018

Potential fixes:

@RalfJung
Copy link
Member Author

RalfJung commented Dec 9, 2018

Turns out Clang has a similar issue, it uses dereferenceable for C++ references and that's not correct because the pointee might get deallocated during the course of the function. Hence another possible fix might be to use dereferenceable_on_entry instead of dereferenceable for &UnsafeCell<T> once that attribute exists.

@jClaireCodesStuff
Copy link

It's a little spooky to see a situation where a region of code could do something undefined without an unsafe block. However, as the 'Nomocon notes, this is to be expected when working around unsafe code. A soundly designed library will encapsulate this potential unsafety, but it's not possible or even desirable to protect closely neighboring safe code from the potential of UB.

Anyway, in this twilight zone dangling pointers are not prohibited. Dereferencing dangling pointers is. Exposing somebody else's unsuspecting code to a dangling pointer is also very forbidden. But it's fine for dangling pointers to exist as long as they are dead.

Remember: "dead" means "will not be accessed by anything below this point." Other languages don't even have a concept of reference lifetimes.

Once the current thread (Thread A) has decremented the reference count, Thread B could come in and free the ArcInner.

That can happen. But if it does, Thread A is guaranteed to fetch a reference count greater than 1, which then causes Arc::drop to return early. self dangles but it isn't dereferenced while dangling, so no problem.

It would be undefined behavior for any method to be called after drop. But we also know that this won't happen, because "drop is the last method call" is one of the invariants.

Honestly, I think you're a little too attached to the "Stacked Borrows" model - this is a situation where it doesn't work and rather than trying to understand why the model should be changed, you're arguing that the source code of stdlib should be declared unsound. If Rust continues in the tradition of LLVM, the uniqueness of &mut really means something like "I solemnly swear that the addresses derived from this reference will not be derived by any other means until after the last use of this reference."

@hanna-kruppe
Copy link
Contributor

The immediate problem is not about Rust, safe or unsafe, or any proposed formal semantics for it. rustc emits IR that has technically undefined behavior, because there is a pointer that is claimed to be dereferenceable for the duration of a function but points to memory that is (potentially) deallocated during the execution of the function. That this is not a Rust-specific problem can also be seen by the fact that Clang ('s C++ mode) is affected by it too, as linked earlier. This specific instance is not terribly likely to cause miscompilations, but others are, and in any case "UB that's unlikely to cause problems in practice" is not a good way to build a reliable language implementation.

Aside from that, there is the task of justifying the LLVM IR that rustc emits with Rust-level rules -- at minimum, in every case where emitted LLVM IR has UB, the input Rust code also needs to have UB to be able to claim rustc is correct. This issue points out a case where previous attempts at justifying the dereferenceable attribute that rustc emits are not sufficient. There are multiple possible remedies for that, none of which are about stacked borrows and most of which modify how rustc emits IR (emitting the attribute in fewer cases, or emitting a weaker attribute instead in some cases) to make the existing code well-defined without modification.

@vertexclique
Copy link
Member

Option one:

  • Provide a version of fetch_sub that takes a raw pointer, and use that.

for resolving this bug was tried in here: #58611

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Oct 23, 2019

I'm nominating this issue -- and tagging it for @rust-lang/lang -- in response to some comments by @jamesmunns in today's Project Leadership Sync meeting. In particular, @jamesmunns pointed out that this issue around the deferenceable attribute has potentially large repercussions in the embedded space, and cited @japaric's RFC as well as a number of issues around the embedded space (see the minutes).

I've got to dig a bit deeper here but it seems like it'd be good to check-in on the current thinking here. I will try to catch up on the RFC and some of the proposed changes and see if I can post any notes.

Without having done so, I would say that, as far as I know, we still want to keep the dereferenceable attribute, so I expect some form of these changes will be required.

(I suppose omething like the [dereferencable on entry proposal(https://lists.llvm.org/pipermail/llvm-dev/2018-July/124555.html) might also be good, does anyone know if that's made progress? I'm going to assume not.)

@vertexclique
Copy link
Member

@nikomatsakis I have done some work around this. But actually that extended visible API surface by one method. It resolves the problem at a cost of that.

@64
Copy link

64 commented Oct 23, 2019

I could be wrong, but it seems like LLVM now supports nofree and nosync function attributes, which would solve the issue. https://lists.llvm.org/pipermail/llvm-dev/2018-July/124555.html

@hanna-kruppe
Copy link
Contributor

nofree and nosync exist now, but as far as I can tell the switch from

dereferenceable applies through the entire function

to

dereferenceable applies on function entry (use nofree and nosync information to recover analysis power)

has not yet happened.

This was referenced Oct 24, 2019
@RalfJung
Copy link
Member Author

Ah so the tentative plan in LLVM now is to weaken dereferencable semantics instead of adding another attribute?

That seems bad though. Quoting:

I'd like to propose adding a nofree function attribute to indicate that
a function does not, directly or indirectly, call a memory-deallocation
function (e.g., free, C++'s operator delete).

So, e.g., a function that takes a reference and also drops an unrelated Box could not express, with its attributes, that the reference will not get deallocated.

Maybe some from the Rust side should chime in an raise this as a concern?

@hanna-kruppe
Copy link
Contributor

I forgot about this before but there's a (not yet committed) patch introducing a new attribute for "dereferencable forever" at https://reviews.llvm.org/D61652

@RalfJung
Copy link
Member Author

RalfJung commented Nov 4, 2019

See #65796 for another instance: there's a call to AtomicBool::store there signalling another thread that it can remove this memory; that deallocation might happen while AtomicBool::store still runs with its &self argument.

@RalfJung
Copy link
Member Author

Unfortunately I have no idea what you mean by this rather vague proposal. Please describe your changes via concrete code / a diff that applies to the current code.

@SoniEx2
Copy link
Contributor

SoniEx2 commented Jun 24, 2022

Again, we don't know if it's possible to do it that way.

The basic idea is we'd allocate a struct on the stack, and place a pointer to it in the ArcInner. we'd then check the ArcInner to see if it was modified and attempt to clear it. if it was modified, we wait for a signal and... ah, we guess this would only shift the problem to AtomicPtr...

Hmm...

... yeah nvm, we guess the only real solution is to tell the abstract machine to just accept it as-is somehow.

@RalfJung
Copy link
Member Author

#98498 is another instance of this -- or rather, after doing the easiest fix, it becomes an interesting instance of this -- except that this time, not the entire memory &self points to is inside UnsafeCell, so the proposed solutions do not suffice to allow that code. We'd have to also move the main_thread field into an UnsafeCell.

@Coder-256
Copy link
Contributor

Coder-256 commented Jun 26, 2022

#98017 seems like a potentially large change to fix a relatively smaller issue. Would the following change not be sufficient?:

-        if self.inner().strong.fetch_sub(1, Release) != 1 {
-            return;
-        }
+        let strong_ptr = self.inner().strong.as_mut_ptr();
+        unsafe {
+            if core::intrinsics::atomic_xsub_rel(strong_ptr, 1) != 1 {
+                return;
+            }
+        };

In this case, the current thread would have no references to inner() by the time the strong count is decremented, and therefore there couldn't exist any dereferenceable references when the ArcInner is dropped, correct?

I know I have commented this before and I don't mean to distract from the conversation, just a bit confused why a change to rustc itself would be required here. I worry that limiting the use of dereferenceable could possibly prevent some optimizations, today or in the future, which may be undesirable if this issue can be solved without it.

@ghost
Copy link

ghost commented Jun 26, 2022

Given that approximately the same issue has come up in rayon, crossbeam, and at least three separate instances in std, it seems like this is a pattern that Rust experts expected to work. IMO that is a sign a language change like the one @RalfJung has proposed is necessary.

If we went that route of not allowing references in this pattern, I think we should consider expanding the API of atomics. Creating functions explicitly designed to be safe for this "may signal that self can be deallocated" pattern would greatly help guide people in the right direction.

@saethlin
Copy link
Member

saethlin commented Jun 26, 2022

I do not think that reasoning is very good. Numerous crates including rayon and the compiler have expected to be able to do out-of-bounds indexing with slice::get_unchecked.

We should not attempt to codify and bless existing practice, because that is largely based on whatever appears to work with the current implementation of the language, and may not lend itself to a coherent set of rules. It seems clear at this point that we cannot have a simple set of rules which permit both the applications and optimizations we would like Rust to have and also accept all existing code, or even all code written by "experts". There will be give on both the language and the use of the language.

I don't have a strong opinion on the right way to resolve this situation, but I will point out that losing dereferenceable from &UnsafeCell isn't particularly surprising; the type already exists to lift one form of pointer/reference optimization. But the fact that this change doesn't resolve #98498 makes me wonder how many codebases which contain the pattern that losing dereferenceable on &UnsafeCell is intended to permit will require soundness fixes anyway, because the root problem isn't that the rules aren't usable at all before or after this change, it's something else. Perhaps that authors aren't used to reasoning about dereferenceable, or Miri wasn't good enough at exposing this bug until recently, or that Miri isn't used/is hard to use on the kind of codebases that would run into this.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 26, 2022

I don't have a strong opinion on the right way to resolve this situation, but I will point out that losing dereferenceable from &UnsafeCell isn't particularly surprising; the type already exists to lift one form of pointer/reference optimization. But the fact that this change doesn't resolve #98498 makes me wonder how many codebases which contain the pattern that losing dereferenceable on &UnsafeCell is intended to permit will require soundness fixes anyway,

My personal preference to resolve the situation is to also say that when a shared ref points to a !Freeze type, we treat the entire pointee as not-frozen (as opposed to attempting to precisely track which bytes are inside an UnsafeCell and which are not). Doing so makes the approach in #98504 sound, and greatly simplifies the specification and its implementation because now a given tag always has the same permission everywhere it is used. We lose some optimizations, but what we currently tell LLVM is still correct. So I think overall that is a nice corner of the design space. This is basically the discussion in rust-lang/unsafe-code-guidelines#236.

bors bot added a commit to crossbeam-rs/crossbeam that referenced this issue Jun 30, 2022
855: Miri: enable preemption again r=taiki-e a=RalfJung

Miri has a [work-around](rust-lang/miri#2248) for rust-lang/rust#55005 now.

Also, "check-number-validity" is on by default these days.  And I am not sure why "compare-exchange-weak-failure-rate=0.0" was set so let's see what happens when we remove it. :)

Co-authored-by: Ralf Jung <[email protected]>
bors bot added a commit to crossbeam-rs/crossbeam that referenced this issue Jul 1, 2022
855: Miri: enable preemption again r=taiki-e a=RalfJung

Miri has a [work-around](rust-lang/miri#2248) for rust-lang/rust#55005 now.

Also, "check-number-validity" is on by default these days.  And I am not sure why "compare-exchange-weak-failure-rate=0.0" was set so let's see what happens when we remove it. :)

858: Ignore clippy::let_unit_value lint r=taiki-e a=taiki-e



Co-authored-by: Ralf Jung <[email protected]>
Co-authored-by: Taiki Endo <[email protected]>
bors bot added a commit to crossbeam-rs/crossbeam that referenced this issue Jul 1, 2022
855: Miri: enable preemption again r=taiki-e a=RalfJung

Miri has a [work-around](rust-lang/miri#2248) for rust-lang/rust#55005 now.

Also, "check-number-validity" is on by default these days.  And I am not sure why "compare-exchange-weak-failure-rate=0.0" was set so let's see what happens when we remove it. :)

858: Ignore clippy::let_unit_value lint r=taiki-e a=taiki-e



Co-authored-by: Ralf Jung <[email protected]>
Co-authored-by: Taiki Endo <[email protected]>
@SoniEx2

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

@SoniEx2

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

@SoniEx2

This comment was marked as off-topic.

@RalfJung

This comment was marked as off-topic.

bors added a commit to rust-lang-ci/rust that referenced this issue Jul 22, 2022
do not mark interior mutable shared refs as dereferenceable

My proposed solution to rust-lang#55005.
@AngelicosPhosphoros
Copy link
Contributor

@RalfJung Can we solve this by changing Drop trait API?

Something like this:

trait Drop{
   fn drop(&mut self);
   
   fn drop_ptr(mut self: NonNull<Self>){
        unsafe{
            // Only case when direct call to `Drop::drop` is allowed.
            Drop::drop(self.as_mut());
        }
   }
}

All existing code would just work because it delegates to old methods that accept the reference.
For cases like discussed ITT, user can implement it like this:

impl Drop for MyTypeWithPtr{
   fn drop(&mut self){ unreachable!("We use only pointer based deallocation") }
   
   fn drop_ptr(mut self: NonNull<Self>){
        // Since we created references inside function, they are not dereferencable.
        // And we can check all our conditions using pointer based API
        // to avoid having living references to data freed in separate thread.
   }
}

This way we would solve issues caused by dropping code like this without giving up dereferencable on &UnsafeCell<X> arguments.

@RalfJung
Copy link
Member Author

It's not the reference in drop that caused the trouble. It's the &self reference in this AtomucIsize method:

pub fn fetch_sub(&self, val: $int_type, order: Ordering) -> $int_type {
    unsafe { atomic_sub(self.v.get(), val, order) }
    // HERE
}

Anyway, with #98017 having landed, I think we can finally close this issue. :) We barely managed to do it in less than 4 years. :D

cuviper added a commit to cuviper/rayon that referenced this issue Jan 19, 2023
`Latch::set` can invalidate its own `&self`, because it releases the
owning thread to continue execution, which may then invalidate the latch
by deallocation, reuse, etc. We've known about this problem when it
comes to accessing latch fields too late, but the possibly dangling
reference was still a problem, like rust-lang/rust#55005.

The result of that was rust-lang/rust#98017, omitting the LLVM attribute
`dereferenceable` on references to `!Freeze` types -- those containing
`UnsafeCell`. However, miri's Stacked Borrows implementation is finer-
grained than that, only relaxing for the cell itself in the `!Freeze`
type. For rayon, that solves the dangling reference in atomic calls, but
remains a problem for other fields of a `Latch`.

This easiest fix for rayon is to use a raw pointer instead of `&self`.
We still end up with some temporary references for stuff like atomics,
but those should be fine with the rules above.
cuviper added a commit to cuviper/rayon that referenced this issue Jan 20, 2023
`Latch::set` can invalidate its own `&self`, because it releases the
owning thread to continue execution, which may then invalidate the latch
by deallocation, reuse, etc. We've known about this problem when it
comes to accessing latch fields too late, but the possibly dangling
reference was still a problem, like rust-lang/rust#55005.

The result of that was rust-lang/rust#98017, omitting the LLVM attribute
`dereferenceable` on references to `!Freeze` types -- those containing
`UnsafeCell`. However, miri's Stacked Borrows implementation is finer-
grained than that, only relaxing for the cell itself in the `!Freeze`
type. For rayon, that solves the dangling reference in atomic calls, but
remains a problem for other fields of a `Latch`.

This easiest fix for rayon is to use a raw pointer instead of `&self`.
We still end up with some temporary references for stuff like atomics,
but those should be fine with the rules above.
bors bot added a commit to rayon-rs/rayon that referenced this issue Jan 21, 2023
1011: Use pointers instead of `&self` in `Latch::set` r=cuviper a=cuviper

`Latch::set` can invalidate its own `&self`, because it releases the
owning thread to continue execution, which may then invalidate the latch
by deallocation, reuse, etc. We've known about this problem when it
comes to accessing latch fields too late, but the possibly dangling
reference was still a problem, like rust-lang/rust#55005.

The result of that was rust-lang/rust#98017, omitting the LLVM attribute
`dereferenceable` on references to `!Freeze` types -- those containing
`UnsafeCell`. However, miri's Stacked Borrows implementation is finer-
grained than that, only relaxing for the cell itself in the `!Freeze`
type. For rayon, that solves the dangling reference in atomic calls, but
remains a problem for other fields of a `Latch`.

This easiest fix for rayon is to use a raw pointer instead of `&self`.
We still end up with some temporary references for stuff like atomics,
but those should be fine with the rules above.


Co-authored-by: Josh Stone <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. WG-embedded Working group: Embedded systems
Projects
None yet
Development

No branches or pull requests