Fix: Use lock properly in filter_order_by_nonces to do it concurrently #38

ghost · 2024-07-07T17:26:38Z

📝 Summary

The current use of Mutex on nonce_cache in filter_order_by_nonces is not correct. It causes the function to run sequentially because every other task will wait for prev one ( both read and write ) complete in order to do its work.
I change to use RwLock instead. Before the change it takes 18m to fetch a block with 2k4 orders, and now it takes 18s with 400 concurrent request.

And lint the file

💡 Motivation and Context

✅ I have completed the following steps:

Run make lint
Run make test
Added tests (if applicable)

github-actions · 2024-07-07T17:30:00Z

Benchmark results for `465eadb`

Report: https://flashbots-rbuilder-ci-stats.s3.us-east-2.amazonaws.com/benchmark/465eadb-db634fc/report/index.html


Date (UTC)	2024-07-13T16:06:03+00:00
Commit	`465eadbd19a1cb2d62f8fe829e819e7db6e1a902`
Base SHA	`db634fcb90a6b2fc484e19f02a59a5b74ffeefa2`

Significant changes

None

bertmiller

Thanks for this PR! Really appreciate it and good catch :)

I think there is a potential race condition in the nonce cache update logic. Currently, the code checks if a value exists, fetches it if not, then writes it back. This could lead to overwriting more recent values in concurrent scenarios: imagine if there were two transactions from account X in different threads.

Here's a scenario that illustrates the problem:

Thread A checks the cache for address X, doesn't find it
Thread B checks the cache for address X, doesn't find it
Thread A fetches nonce N1 from the network for address X
Thread B fetches nonce N2 from the network for address X (N2 > N1 because it's more recent)
Thread B writes N2 to the cache
Thread A writes N1 to the cache, overwriting the more recent N2

What do you think about using a single write lock operation that checks and updates atomically? Something like this:

let mut nonce_cache = nonce_cache.write().map_err(|e| eyre::eyre!("Failed to acquire write lock: {}", e))?;
if !nonce_cache.contains_key(&nonce.address) {
    let address = nonce.address;
    let onchain_nonce = self
        .eth_provider
        .get_transaction_count(address)
        .block_id(BlockId::Number(parent_block.into()))
        .await
        .wrap_err("Failed to fetch onchain tx count")?;
    nonce_cache.insert(nonce.address, onchain_nonce);
}
let res_onchain_nonce = *nonce_cache.get(&nonce.address).unwrap();

crates/rbuilder/src/backtest/fetch/mod.rs

ghost · 2024-07-08T09:01:43Z

@bertmiller spot on the analysis that there is an overwrite on the nonce value for the same address X. However, the part of getting the nonce, is a nonce of an address from parent block which should be the same regardless of the order of onchain network request to get the data.

ZanCorDX · 2024-07-10T19:35:40Z

Thanks!!!!!
I believe the change from Mutex to RWlock is unimportant, but the great catch is not keeping the lock while calling get_transaction_count.
Can you remove the is_none and the 2 unwraps in favor of an if let?

ghost · 2024-07-12T08:39:40Z

Thanks!!!!! I believe the change from Mutex to RWlock is unimportant, but the great catch is not keeping the lock while calling get_transaction_count. Can you remove the is_none and the 2 unwraps in favor of an if let?

Thanks for your feedback. At the is_none() checking, we don't intend to read the value, only write if it's none and for the 2 unwrap(), the value surely is available, so I don't think we need to use if-let here, I believe.

ZanCorDX · 2024-07-12T12:36:28Z

Thanks!!!!! I believe the change from Mutex to RWlock is unimportant, but the great catch is not keeping the lock while calling get_transaction_count. Can you remove the is_none and the 2 unwraps in favor of an if let?

Thanks for your feedback. At the is_none() checking, we don't intend to read the value, only write if it's none and for the 2 unwrap(), the value surely is available, so I don't think we need to use if-let here, I believe.

It's just to make the code more robust, in most cases rust allows us to avoid wrap() which might fail if something changes in the future.
It would be like this:

                     let res_onchain_nonce = if let Some(res_onchain_nonce) = res_onchain_nonce {
                            res_onchain_nonce
                        } else {
                            let address = nonce.address;
                            let onchain_nonce = self
                                .eth_provider
                                .get_transaction_count(address)
                                .block_id(BlockId::Number(parent_block.into()))
                                .await
                                .wrap_err("Failed to fetch onchain tx count")?;

                            if let Ok(mut nonce_cache) = nonce_cache.write() {
                                nonce_cache.entry(address).or_insert(onchain_nonce);
                            }
                            onchain_nonce
                        };
                        if res_onchain_nonce > nonce.nonce && !nonce.optional {

This way the code enforces the correctness at compile time and not at runtime just by the logic of it.

ghost · 2024-07-13T10:34:02Z

it makes sense.

#38) ## 📝 Summary The current use of Mutex on `nonce_cache` in `filter_order_by_nonces` is not correct. It causes the function to run sequentially because every other task will wait for prev one ( both read and write ) complete in order to do its work. I change to use RwLock instead. Before the change it takes 18m to fetch a block with 2k4 orders, and now it takes 18s with 400 concurrent request. - And lint the file ## 💡 Motivation and Context  --- ## ✅ I have completed the following steps: * [x] Run `make lint` * [x] Run `make test` * [ ] Added tests (if applicable)

dev: fix to use rwlock and use it properly

819070a

ghost requested review from dvush and ZanCorDX as code owners July 7, 2024 17:26

ghost changed the title ~~Fix: change from Mutex to RwLock to filter_order_by_nonces concurrently~~ Fix: Use lock properly in filter_order_by_nonces to do it concurrently Jul 7, 2024

bertmiller reviewed Jul 8, 2024

View reviewed changes

crates/rbuilder/src/backtest/fetch/mod.rs Outdated Show resolved Hide resolved

dev: change to use option instead of 0 nonce

83f4d0b

dvush approved these changes Jul 8, 2024

View reviewed changes

ghost force-pushed the fix_concurrent_filter_by_nonces branch from b5ff532 to b671c02 Compare July 11, 2024 17:08

dev: minor refactor

99fde4a

ghost force-pushed the fix_concurrent_filter_by_nonces branch from b671c02 to 99fde4a Compare July 11, 2024 22:27

jn added 3 commits July 13, 2024 03:34

Merge branch 'develop' into fix_concurrent_filter_by_nonces

0725718

Add if-let and remove 2 unwrap()

45dd902

remove unused dep

465eadb

ghost force-pushed the fix_concurrent_filter_by_nonces branch from e6f0c9b to 465eadb Compare July 13, 2024 16:05

ZanCorDX merged commit 824cdab into flashbots:develop Jul 15, 2024
2 checks passed

ghost deleted the fix_concurrent_filter_by_nonces branch July 15, 2024 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Use lock properly in filter_order_by_nonces to do it concurrently #38

Fix: Use lock properly in filter_order_by_nonces to do it concurrently #38

ghost commented Jul 7, 2024 •

edited by ghost

Loading

github-actions bot commented Jul 7, 2024 •

edited

Loading

bertmiller left a comment

ghost commented Jul 8, 2024

ZanCorDX commented Jul 10, 2024

ghost commented Jul 12, 2024

ZanCorDX commented Jul 12, 2024

ghost commented Jul 13, 2024

Fix: Use lock properly in filter_order_by_nonces to do it concurrently #38

Fix: Use lock properly in filter_order_by_nonces to do it concurrently #38

Conversation

ghost commented Jul 7, 2024 • edited by ghost Loading

📝 Summary

💡 Motivation and Context

✅ I have completed the following steps:

github-actions bot commented Jul 7, 2024 • edited Loading

Benchmark results for 465eadb

Significant changes

bertmiller left a comment

Choose a reason for hiding this comment

ghost commented Jul 8, 2024

ZanCorDX commented Jul 10, 2024

ghost commented Jul 12, 2024

ZanCorDX commented Jul 12, 2024

ghost commented Jul 13, 2024

ghost commented Jul 7, 2024 •

edited by ghost

Loading

github-actions bot commented Jul 7, 2024 •

edited

Loading

Benchmark results for `465eadb`