Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate ethers-rs middleware for "stuck" txs #2959

Closed
Tracked by #3321
tkporter opened this issue Nov 22, 2023 · 4 comments · Fixed by #3852
Closed
Tracked by #3321

Investigate ethers-rs middleware for "stuck" txs #2959

tkporter opened this issue Nov 22, 2023 · 4 comments · Fixed by #3852

Comments

@tkporter
Copy link
Collaborator

tkporter commented Nov 22, 2023

Context

The relayer submits a transaction with a gas price that is no longer high enough to execute, but instead of the relayer submitting on certain networks like polygon the tx is stuck in the mempool. To "unstuck" the relayer needs to submit a higher bid with x% more gas, but our relayer currently lacks that logic.

https://docs.rs/ethers/latest/ethers/middleware/struct.GasEscalatorMiddleware.html is probably what we want

Some things to note:

  • See https://cloudlogging.app.goo.gl/jXpgMgTHsWZpUk1s9 for examples of this. Interestingly, this only seems to happen with polygon txs and still occurs pretty frequently.
  • This was originally written with Polygon in mind that we fixed in Use Polygon gas oracle #2965. Addressing Relayer tx issues  #2924 is now the main focus of this. Seemingly the Polygon fix wasn't totally sufficient
  • Be careful of our usage around fallback provider here (e.g. if the gas escalation only works at the "inner provider" level and not at a higher level, this may not actually be useful)
  • Be careful that we construct new providers pretty frequently, e.g. when constructing metadata. I believe we use the same provider for all tx submission, but maybe we don't. I believe we'd need to use the same long-living provider if we wanted to use the gas escalator middleware
@yorhodes
Copy link
Member

@yorhodes yorhodes moved this to Backlog in Hyperlane Tasks Nov 22, 2023
@tkporter
Copy link
Collaborator Author

nice, we'll have to make sure the footgun / issues aren't a problem for us & also that this covers the case of immediately getting back a "transaction underpriced" error and not just if the tx doesn't get included in a block in time

tkporter added a commit that referenced this issue Nov 23, 2023
### Description

After investigating #2959, I found the following

this is a known problem with Polygon, explanation here
ethers-io/ethers.js#2828 (comment)

Fun fact Asa looked into this once
#771

Here's a discussion it in Foundry, which I found hoping that ethers-rs
folks had ran into this before
foundry-rs/foundry#1703

Foundry fixed this by using the Polygon gas station oracle which seems
to be recommended path
https://github.com/foundry-rs/foundry/pull/3368/files#diff-c89a4bbf7a90da118dcf00c5fe70eba78f8e5d95662bb5f039a353113e95042bR205

There's actually a polygon ethers middleware for this
https://docs.rs/ethers/latest/ethers/middleware/gas_oracle/polygon/struct.Polygon.html

So I (originally) borrowed this code from Foundry
https://github.com/foundry-rs/foundry/blob/master/crates/common/src/provider.rs#L254-L290

Changed to use Middlewares

This also means we can remove our existing janky Polygon logic

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

<!--
- Fixes #[issue number here]
-->

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
nambrot pushed a commit that referenced this issue Nov 24, 2023
After investigating #2959, I found the following

this is a known problem with Polygon, explanation here
ethers-io/ethers.js#2828 (comment)

Fun fact Asa looked into this once
#771

Here's a discussion it in Foundry, which I found hoping that ethers-rs
folks had ran into this before
foundry-rs/foundry#1703

Foundry fixed this by using the Polygon gas station oracle which seems
to be recommended path
https://github.com/foundry-rs/foundry/pull/3368/files#diff-c89a4bbf7a90da118dcf00c5fe70eba78f8e5d95662bb5f039a353113e95042bR205

There's actually a polygon ethers middleware for this
https://docs.rs/ethers/latest/ethers/middleware/gas_oracle/polygon/struct.Polygon.html

So I (originally) borrowed this code from Foundry
https://github.com/foundry-rs/foundry/blob/master/crates/common/src/provider.rs#L254-L290

Changed to use Middlewares

This also means we can remove our existing janky Polygon logic

<!--
Are there any minor or drive-by changes also included?
-->

<!--
- Fixes #[issue number here]
-->

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
nambrot pushed a commit that referenced this issue Dec 1, 2023
After investigating #2959, I found the following

this is a known problem with Polygon, explanation here
ethers-io/ethers.js#2828 (comment)

Fun fact Asa looked into this once
#771

Here's a discussion it in Foundry, which I found hoping that ethers-rs
folks had ran into this before
foundry-rs/foundry#1703

Foundry fixed this by using the Polygon gas station oracle which seems
to be recommended path
https://github.com/foundry-rs/foundry/pull/3368/files#diff-c89a4bbf7a90da118dcf00c5fe70eba78f8e5d95662bb5f039a353113e95042bR205

There's actually a polygon ethers middleware for this
https://docs.rs/ethers/latest/ethers/middleware/gas_oracle/polygon/struct.Polygon.html

So I (originally) borrowed this code from Foundry
https://github.com/foundry-rs/foundry/blob/master/crates/common/src/provider.rs#L254-L290

Changed to use Middlewares

This also means we can remove our existing janky Polygon logic

<!--
Are there any minor or drive-by changes also included?
-->

<!--
- Fixes #[issue number here]
-->

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
@nambrot nambrot moved this from Backlog to Next Sprint in Hyperlane Tasks Dec 18, 2023
@avious00 avious00 moved this from Next Sprint to Backlog in Hyperlane Tasks Feb 1, 2024
@tkporter tkporter moved this from Backlog to Next Sprint in Hyperlane Tasks Mar 13, 2024
@avious00 avious00 moved this from Next Sprint to Sprint in Hyperlane Tasks Mar 25, 2024
@daniel-savu daniel-savu moved this from Sprint to Next Sprint in Hyperlane Tasks Apr 19, 2024
@avious00 avious00 moved this from Next Sprint to Sprint in Hyperlane Tasks Apr 24, 2024
@daniel-savu
Copy link
Contributor

daniel-savu commented May 27, 2024

Be careful of our usage around fallback provider here (e.g. if the gas escalation only works at the "inner provider" level and not at a higher level, this may not actually be useful)

Since our fallback provider logic is a client that middleware wraps, the gas escalator can actually only be used for the entire fallback rather than inner ones. See here how the build step that wraps with providers happens after the fallback client is built - and same for the retrying provider, in the match arm below. We're not susceptible to the "memory leak" bug as long as we have one RPC still live - because as soon as one of them comes back online, resubmits start happening.

Be careful that we construct new providers pretty frequently, e.g. when constructing metadata. I believe we use the same provider for all tx submission, but maybe we don't. I believe we'd need to use the same long-living provider if we wanted to use the gas escalator middleware

Had a look, indeed we only transact via long-lived instances (mailbox and validator_announce) - these are always wrapped in signing providers, and we can also add the gas escalator middleware where this happens.

the case of immediately getting back a "transaction underpriced" error and not just if the tx doesn't get included in a block in time

If we get back an error, it just gets propagated. I believe this is less of an issue since the message will be retried if submission fails, but we could include this specific error message in the retrying provider logic here.

@tkporter
Copy link
Collaborator Author

We're not susceptible to the "memory leak" bug as long as we have one RPC still live - because as soon as one of them comes back online, resubmits start happening.

I think it's still possible to get an error from a provider even if they're technically "live", (e.g. if we're low on funds or the gas price is too low or something and can't successfully submit a transaction) and the task will error here https://github.com/hyperlane-xyz/ethers-rs/blob/hyperlane/ethers-middleware/src/gas_escalator/mod.rs#L223

But in any case I'm comfortable treating this as an unlikely edge case that we're aware of and we can revisit if we have issues

github-merge-queue bot pushed a commit that referenced this issue Jul 3, 2024
### Description

Wraps evm providers in a gas escalator middleware as part of the
`build_with_signer` step, so signed txs can have their gas price bumped
if stuck in the mempool.

Gas price is increased every 60s by 12.5%, and is capped at the
`max_fee_per_gas` set in tx overrides.

### Drive-by changes

- changes to our fork of `ethers`:
- removes the `M: Clone` bound, since it's not actually required and
conflicts with the middleware we use
- increases the level of the `escalated` log in the middleware to
`debug` (so we can see it working in prod without enabling `trace`)

### Related issues

- Fixes #2959

### Backward compatibility

Yes

### Testing

e2e

---------

Co-authored-by: Trevor Porter <[email protected]>
@github-project-automation github-project-automation bot moved this from In Review to Done in Hyperlane Tasks Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants