-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_with
deadlock
#240
Comments
I've just spotted #59 which is strongly related. |
Thank you for reporting the issue with the repro. If I understand correctly, we cannot workaround the both cases at the cache side:
|
As for the second case, maybe you can resolve the issue at the caller side by using My current idea is:
|
I'm not convinced by your statement of:
I don't think there's anything stopping Request B and C from polling A's future while it's still working, if it panic's then they should submit their own future to be used but they could poll each other's futures like |
No, request B and C cannot poll it.
Future A is already dropped here, so it cannot be polled. |
Is there any reason that dropping Request A has to drop Future A though? The way something like |
Future A is owned by Request A. In Rust, When Request A is dropped, Future A is also dropped. |
When I have time (maybe after work today. It is 6 AM now in my time zone), I will create a reproducing code for your original problem: Request A misses the cache => starts Future A to retrieve the value
Request B misses the cache => waits on the completion of Future A
Request A times out => Future A is dropped and Future B now starts getting awaited
Request C misses the cache => waits on the completion of Future B
Request B times out => Future B is dropped and Future C now starts getting awaited
... Then I will check if there is any way to workaround the issue at |
Maybe we can avoid dropping Future A when Request A is dropped. I will try to move the ownership of Future A from Request A to an internal hash table (called |
But to do so, I will have to remove the workaround for #212. It is an optimization issue in |
I've been thinking some more and while I would like that as a solution, it does have downsides. Specifically, I believe it will be a breaking change because the future passed to get_with would have to be 'static. It's possible the best solution is just documenting everything carefully and doing |
I started some experiments here: It wraps the pub(crate) type SharedInit<V> =
Shared<Pin<Box<dyn Future<Output = Result<V, ErrorObject>> + Send + 'static>>>;
type WaiterMap<K, V, S> = crate::cht::SegmentedHashMap<(Arc<K>, TypeId), SharedInit<V>, S>;
I believe so. It has to be Instead of breaking pub async fn or_insert_with_shared(
self,
shared_init: impl Future<Output = V> + Send + 'static,
) -> Entry<K, V>; and I confirmed the deadlocking code no longer causes the deadlock if modified to use use futures::{future::poll_immediate, pin_mut};
use moka::future::Cache;
use tokio::task::yield_now;
#[tokio::main]
async fn main() {
let cache = Cache::builder().build();
let a = cache.entry(1).or_insert_with_shared(async {
yield_now().await;
"A"
});
pin_mut!(a);
assert!(poll_immediate(&mut a).await.is_none());
let b = cache.entry(1).or_insert_with_shared(async { "B" });
println!("{}", b.await.into_value()); // Should print "A".
} It is not quite ready. I have the following TODOs on an internal method called // TODO:
//
// 1. Currently, all the owners of a shared future will call `insert` closure and
// return `Initialized`, which is wrong. Make only one of the owners to keep
// doing so, and others to return `ReadExisting`.
// - To realize this, we will need to copy `Shared` future code from
// `futures_util` and modify it.
// 2. Decide what to do when all the owners of a shared future are dropped
// without completing it.
// - Do we want to remove the shared future from the waiter map?
// - Or, keep it for a configurable timeout and remove it after that?
// 3. Decide what to do when a shared future is panicked.
// - We probably utilize the current `future_utils::future::Shared`
// implementation, which causes all the owners of it to panic? (No retrying)
// |
Woah, that's really cool. Some thoughts on the TODOs:
|
Hey @tatsuya6502, do you have any updates on this issue? |
I think in the absence of a fix, it'd be great if it could be documented that the The short term solution is, as mentioned above, to spawn a new task which handles running the |
The following code deadlocks. The idea is to simulate one task calling
get_with
but then not being polled for an arbitrarily long period of time (or even just forgotten), another task then comes along and appears to get blocked behind the first being polled to completion.I was expecting the future returned by
get_with
to act more likefutures::future::Shared
, where polls on any of the shared futures can progress the inner future.This is the most obvious bad behaviour of this function, but we hit this in the wild as the futures that wait on our cache get calls have a shorter timeout than it takes to resolve the value in the event of a cache miss. That means that the cache never fills because you get behaviour like the following:
The text was updated successfully, but these errors were encountered: