Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lru): correctly handle future cancellation #3911

Merged
merged 6 commits into from
Jul 15, 2022
Merged

Conversation

skyzh
Copy link
Contributor

@skyzh skyzh commented Jul 15, 2022

Signed-off-by: Alex Chi [email protected]

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

As title. lookup_dedup will now spawn a task in the background, so that it won't deadlock.

Example: We have two futures running.

let future_1 = async move {
  lru.lookup("a", fetch_from_storage).await
}

let future_2 = async move {
  lru.lookup("a", fetch_from_storage).await
}

If we run them in the following way:

future_1.poll_next(); // so that we will cache miss, and LRU will call `fetch_from_storage` to get data.
future_2.poll_next(); // as request to "a" is in-flight, LRU will wait until future_1 completes the request.
drop(future_1);
loop {
  assert_eq!(future_2.poll_next(), Poll::pending); // future 2 will never return
}

Therefore, we spawn a tokio task in the background for every in-flight request, so that it won't be cancelled by users.

This PR also fixes some unit tests, so that they won't stuck in single-thread tokio runtime.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

close #3909
close #3907
close #2580

@skyzh
Copy link
Contributor Author

skyzh commented Jul 15, 2022

Need test before merging.

@github-actions github-actions bot added the type/fix Bug fix label Jul 15, 2022
@skyzh skyzh marked this pull request as ready for review July 15, 2022 06:34
@@ -766,25 +768,33 @@ impl<K: LruKey + Clone, T: LruValue> LruCache<K, T> {
) -> Result<Result<CachableEntry<K, T>, E>, RecvError>
where
F: FnOnce() -> VC,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems async closure is unnecessary? We only need a future here. cc @Little-Wallace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary. There's some variables that only need to be computed when cache miss. See other parts of the PR for more information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. we need to call self.meta_path to compute the meta path ONLY when cache miss. But &self is not 'static, so we can only compute it in this FnOnce closure and outside the Future + 'static.

@skyzh
Copy link
Contributor Author

skyzh commented Jul 15, 2022

This would possibly fix TPC-H Q18.

Copy link
Contributor

@TennyZhuang TennyZhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks!

@skyzh
Copy link
Contributor Author

skyzh commented Jul 15, 2022

ref #2580

@TennyZhuang
Copy link
Contributor

Can you add some UTs for cancellation safety?

Signed-off-by: Alex Chi <[email protected]>
@skyzh
Copy link
Contributor Author

skyzh commented Jul 15, 2022

Can you add some UTs for cancellation safety?

You may add later. My test in #3907 cannot work because now block cache requires tokio runtime.

@skyzh skyzh requested a review from wenym1 July 15, 2022 07:01
Signed-off-by: Alex Chi <[email protected]>
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
},
LookupResult::Miss => {
let this = self.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rewrite it as

let ret = tokio::spawn(fetch_value).await;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. fetch_value itself is not 'static. The future it returns is 'static.

@codecov
Copy link

codecov bot commented Jul 15, 2022

Codecov Report

Merging #3911 (85ce7b6) into main (487ef80) will increase coverage by 0.01%.
The diff coverage is 94.46%.

@@            Coverage Diff             @@
##             main    #3911      +/-   ##
==========================================
+ Coverage   73.86%   73.88%   +0.01%     
==========================================
  Files         818      818              
  Lines      115615   115638      +23     
==========================================
+ Hits        85404    85441      +37     
+ Misses      30211    30197      -14     
Flag Coverage Δ
rust 73.88% <94.46%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/object_store/src/object/disk.rs 94.41% <84.61%> (+0.09%) ⬆️
...rc/storage/src/storage_failpoints/test_iterator.rs 94.80% <88.88%> (-0.62%) ⬇️
src/storage/src/hummock/sstable_store.rs 80.48% <91.42%> (-0.51%) ⬇️
src/storage/src/hummock/iterator/test_utils.rs 96.11% <94.44%> (-0.98%) ⬇️
src/storage/hummock_test/src/state_store_tests.rs 75.73% <95.62%> (-0.31%) ⬇️
src/common/src/cache.rs 97.98% <100.00%> (+0.01%) ⬆️
src/storage/src/hummock/block_cache.rs 73.26% <100.00%> (+1.10%) ⬆️
src/storage/src/hummock/iterator/forward_merge.rs 96.76% <100.00%> (+0.37%) ⬆️
...ge/src/hummock/sstable/forward_sstable_iterator.rs 95.83% <100.00%> (ø)
src/storage/src/storage_failpoints/test_sstable.rs 97.05% <100.00%> (-0.03%) ⬇️
... and 23 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@mergify mergify bot merged commit 0a40cb6 into main Jul 15, 2022
@mergify mergify bot deleted the skyzh/fix-lru-cancel branch July 15, 2022 07:49
nasnoisaac pushed a commit to nasnoisaac/risingwave that referenced this pull request Aug 9, 2022
* fix(lru): correctly handle future cancellation

Signed-off-by: Alex Chi <[email protected]>

* fix test blocking

Signed-off-by: Alex Chi <[email protected]>

* fix failpoints test

Signed-off-by: Alex Chi <[email protected]>

* fix clippy

Signed-off-by: Alex Chi <[email protected]>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lru: future cancellation or not polled will lead to deadlock create mv stuck for unknown reason
5 participants