Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/fetcher: fix blob transaction propagation #30125

Merged
merged 5 commits into from
Sep 6, 2024

Conversation

roberto-bayardo
Copy link
Contributor

This PR fixes an issue with blob transaction propagation due to the blob transation txpool rejecting transactions with gapped nonces. The specific changes are:

  • fetch transactions from a peer in the order they were announced to minimize nonce-gaps (which cause blob txs to be rejected

  • don't wait on fetching blob transactions after announcement is received, since they are not broadcast

Testing:

  • unit tests updated to reflect that fetch order should always match tx announcement order
  • unit test added to confirm blob transactions are scheduled immediately for fetching
  • running the PR on an eth mainnet full node without incident so far

// blob transactions are never broadcast, so to force them
// to be fetched immediately we pretend they arrived
// earlier.
f.waittime[hash] = f.clock.Now() - mclock.AbsTime(txArriveTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manipulating the time feels wrong. If we want to blob txs to skip the waitlist all together, we should just do that instead of tricking the mechanism to shuffle them out faster than other txs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was what I implemented in the earlier PR, but @karalabe suggested doing it like this (if I understood him correctly): #30118 (comment)
LMK if there's a strong preference either way.

Copy link
Contributor Author

@roberto-bayardo roberto-bayardo Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lightclient @karalabe added one more commit which replaces time-travel with explicit skipping of the waitlist if you want to compare: bbdb624

(Note that the test case changed because it resulted in a different peer iteration order, not because the behavior is significantly different -- the only real change to the test case is no need to do a trivial wait for the blob tx to go to feteching state)

@karalabe karalabe self-assigned this Jul 16, 2024
@roberto-bayardo roberto-bayardo force-pushed the order-announcements branch 2 times, most recently from 88cef9b to 3ff3634 Compare July 22, 2024 21:03
@roberto-bayardo
Copy link
Contributor Author

Pinging to keep this alive... anything else I can provide? I'm still running this PR on my home node, so far so good!

@roberto-bayardo roberto-bayardo force-pushed the order-announcements branch 2 times, most recently from efcd01c to b956778 Compare July 30, 2024 03:17
@@ -849,7 +849,16 @@ func (s *Suite) TestBlobViolations(t *utesting.T) {
if code, _, err := conn.Read(); err != nil {
t.Fatalf("expected disconnect on blob violation, got err: %v", err)
} else if code != discMsg {
t.Fatalf("expected disconnect on blob violation, got msg code: %d", code)
if code == 24 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if code == 24 {
if code == protoOffset(ethProto)+eth.NewPooledTransactionHashesMsg {

a bit better imo than using raw code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, done

if sizes[i] == 0 {
// invalid size parameter, return error
return fmt.Errorf("announcement from tx %x had an invalid 0 size metadata", hash)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should special case the size check here. The same way 0 is bad, so is 1 and 2 and probably a few more. We also don't really expect getting sent 0 unless it's a faulty remote client (i.e. a malicious can send 1). Also 0 will fail post-validation after we retrieve the hash, just as say 1 would. Cleaner just to not care about it at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

eth/fetcher/tx_fetcher.go Outdated Show resolved Hide resolved
@karalabe
Copy link
Member

karalabe commented Aug 6, 2024

We're not 100% sure it's the best API change, but also we're not entirely sure we have a strongly better alternative, so I'll try out a couple variations and will possibly end up pushing a commit on top.

@roberto-bayardo roberto-bayardo force-pushed the order-announcements branch 2 times, most recently from 97a0d28 to 25591c2 Compare August 12, 2024 22:35
@rjl493456442
Copy link
Member

Hi! I just made a polish commit locally rjl493456442#13

We will discuss the modification today internally and will push the changes to your branch if
it's accepted!

roberto-bayardo and others added 4 commits August 15, 2024 22:18
…minimize nonce-gaps (which cause blob txs to be rejected)

- don't wait on fetching blob transactions after announcement is received, since they are not broadcast
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (eyes only, have not tested)

@fjl fjl removed the status:triage label Aug 20, 2024
@fjl fjl assigned fjl and unassigned karalabe Aug 20, 2024
@fjl fjl added this to the 1.14.9 milestone Aug 20, 2024
@roberto-bayardo
Copy link
Contributor Author

Any updates on this change? We're still seeing this issue bite us, but luckily only Sepolia though. We've had to switch Base sepolia tesnet entirely back to calldata because we can't get blobs to keep up with the chain.

@holiman
Copy link
Contributor

holiman commented Aug 30, 2024

We're planning on getting this merged next week, that's what the merge-at-meeting-tag is about

eth/fetcher/tx_fetcher.go Outdated Show resolved Hide resolved
Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karalabe karalabe merged commit 88c8459 into ethereum:master Sep 6, 2024
2 of 3 checks passed
@roberto-bayardo roberto-bayardo deleted the order-announcements branch September 6, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants