Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geth not broadcasting transfer #22308

Open
bogdangainusa opened this issue Feb 11, 2021 · 59 comments
Open

Geth not broadcasting transfer #22308

bogdangainusa opened this issue Feb 11, 2021 · 59 comments
Labels

Comments

@bogdangainusa
Copy link

System information

Geth version: 1.9.23
OS & Version: Debian 9

Expected behaviour

Transfer to be sent out and processed by network

Actual behaviour

Transfer is in pending transaction state and even after geth restart transfer is not processed.

Steps to reproduce the behaviour

Random simply sent transfer out

It seems again this issue is happening, i have reported it 2 times and also other users reported. Last time when this happened geth restart fixed the issue and transfer was processed. Now even after geth restart transfer is still in pending state and it's not broadcasted to network.

One of previous issues:
#21385 (comment)

Info from geth:

11834309
> eth.syncing
false
> eth.pendingTransactions
[{
    blockHash: null,
    blockNumber: null,
    from: "0x6256f266eb6484734121bbab099c664f81ee5a0c",
    gas: 21000,
    gasPrice: 121000000000,
    hash: "0x4b60ee38c374c753f3e0d682b9f97d169fe4276d95cdd178a495e9dc3d8fbbcf",
    input: "0x",
    nonce: 0,
    r: "0x2feb71625a98ef2389f4371bb693e366396d62b7ab2b1b63ef7d38eb24e0a5e9",
    s: "0x1442623eabea179ec6149808386197637391ff729575958504466210d3b00959",
    to: "0x9f2e63c42623fcf34f854c78335a668c327dac48",
    transactionIndex: null,
    v: "0x26",
    value: 821171540000000000
}]
> 
@bogdangainusa
Copy link
Author

bogdangainusa commented Feb 12, 2021

Sidenote, geth sync stopped after ~ 8 hours from restart and nothing into logs about it.
After geth restart some of transactions were broadcasted into network.
Server info:
4 cores/8GB of ram
Amazon EC2 c5.xlarge server
debian 9

@holiman
Copy link
Contributor

holiman commented Feb 12, 2021

The txs was mined here: https://etherscan.io/tx/0x4b60ee38c374c753f3e0d682b9f97d169fe4276d95cdd178a495e9dc3d8fbbcf , Feb-11-2021 01:16:31 PM +UTC. Your issue was filed Feb-11-2021 10:59 AM GMT+1

As far as I can tell, your issue was filed 09:59 UTC, and it was mined 13:16, at 121 Gwei. According to etherscan, https://etherscan.io/chart/gasprice , the average gas price on feb 11 was 175Gwei. So it's not clear to me that geth really malfunctioned here. The tx got mined eventually..?

@bogdangainusa
Copy link
Author

bogdangainusa commented Feb 15, 2021

@holiman You're wrong, this is not the issue that i have reported.

Geth returned txid but transfer was stuck inside geth with pending state. For 3 hours.
After first restart of geth transfer was still in pending state - waited 1 hour and nothing changed even if geth sync worked without issues.
Tried another geth restart and suddenly transfer was accepted and not anymore in pending state.

I have reported same issue and someone replied issue was fixed (same issue ) - we are running multiple servers with geth and issue is happening random and only geth restart fix this pending transfers and broadcasting them into network.
image

Please kindly note, this issue is present on many of our systems and we're just simply sending transfers our using eth_sendTransaction some of transfers are sent out but some of them remain in PENDING state untill geth is restarted and transfer is not anymore in pending list.

Now we're facing again same issue 3 transfers in pending state and geth is fully synced.

`> eth.syncing
false

eth.blockNumber
11860183
eth.pendingTransactions
[{
blockHash: null,
blockNumber: null,
from: "0x9f2e63c42623fcf34f854c78335a668c327dac48",
gas: 21000,
gasPrice: 140000000000,
hash: "0xc641ec3e02d2b094e0e8b6f82897fce2953625ea88e769011d89c6a1fb80658a",
input: "0x",
nonce: 27,
r: "0xcc52ed99ce1d3434ea90419926dba2da516ed6692c9a84843497eb289cb8b459",
s: "0x1ebfbadd1383c246ab16215d52a3cefc4d678bfeb9940bba590968bf1a98222a",
to: "0xd819b145004b4e88c2f9f814246800f65c4b1e9b",
transactionIndex: null,
v: "0x25",
value: 113387710000000000
}, {
blockHash: null,
blockNumber: null,
from: "0x9f2e63c42623fcf34f854c78335a668c327dac48",
gas: 21000,
gasPrice: 120000000000,
hash: "0x6c84bbb044aa2aea668c841f3274f1b08ca6da4da699eefeec95467d6f4bef92",
input: "0x",
nonce: 28,
r: "0xec0fa88cde01b05db0027091e1008814fe9e1b50a1511833baaa14d800d69378",
s: "0x7f98553c04ddd58631eb0852e0dbee779b2b7aa0b97c1ec1bbf42d72565eeae4",
to: "0x4f9637a095cb48ecfdea7d5abece9bf51bbc752b",
transactionIndex: null,
v: "0x26",
value: 320793360000000000
}, {
blockHash: null,
blockNumber: null,
from: "0x9f2e63c42623fcf34f854c78335a668c327dac48",
gas: 33545,
gasPrice: 133000000000,
hash: "0x375a4fae37ad86fb1f3dbd9b1f886acc0b1655ceea0537940f696496735cb75b",
input: "0x",
nonce: 29,
r: "0xaf54cff52cd8b0020d9858ee1c47d33ca919ffe1fae6bd655230d9b7b0b03f55",
s: "0x342cb0eb881d8111b2dfb032d29a24239176e2ffdb1eeda1ea650aa67924a883",
to: "0xd76becb1ae8588963515c52e8941f1b002471715",
transactionIndex: null,
v: "0x25",
value: 7633337180000000000
}, {
blockHash: null,
blockNumber: null,
from: "0x9f2e63c42623fcf34f854c78335a668c327dac48",
gas: 21000,
gasPrice: 155000000000,
hash: "0xf3d1a44b11ac2c85e406214ac7f3cca6ff23d62967403816e9282fcee92fbff6",
input: "0x",
nonce: 30,
r: "0x3ace40d0cf3103711f896f6d1618693848ae39a599d20cd6aff6ec3ce85d3590",
s: "0x6f9cdc5c7a966aa27c0ae7eb4047e02aac8f9baad16a45bfaae11368c243ffec",
to: "0x87a38ff8789409c5efd5c4686d0baea5661b38bd",
transactionIndex: null,
v: "0x25",
value: 100000000000000000
}]

`

@holiman
Copy link
Contributor

holiman commented Feb 15, 2021

While it's not really helping solving the root issue, a temporary hack you can use to work around these scenarios, apart from restarting geth, can be to use another node to broadcast the transactions. eth.getRawTransaction(<hash>) would give you a hex-encoded string, which you can send to e.g. https://etherscan.io/pushTx or https://app.mycrypto.com/broadcast-transaction

@bogdangainusa
Copy link
Author

Thank you for this hint but i would avoid doing this, We're running multiple servers and it will take a lot of time to check each of them and rebroadcast the transaction manually.

I would rather wait for this geth issue to be solved properly.

@karalabe
Copy link
Member

So, I'm king of agreeing that Martin is correct here.

You sent a transaction with 121 gwei, the network's average price on that day way 175 gwei. How do you expect to be included fast in a block if you underprice the network?

@bogdangainusa
Copy link
Author

@karalabe Sorry but i don't think you get the full picture of the issue here.
When we're sending transfer we're getting gwei price from https://etherscan.io/gastracker average value, so the issue is not related at all to the gwei.

Even if the issue was the gas price THE TRANSFER SHOULD BE VISIBLE ON BLOCK EXPLORER if we're checking the txid from pending transfers. But when we're checking transfer etherscan return invalid txhash. When transfer is sent out with low gwei and geth successfully process it, transfer is available on block explorer with pending state.

Here the issue is: transfer is send with correct amount of gwei, and all other necessary details but geth won't send transfers out. Blocks are updating but if we check eth.pendingTransactions we have plenty of transfers in pending state and when we check any of this txid inside explorer transfer isn't available. IF WE RESTART GETH - GETH BROADCAST TRANSFER AND THEY ARE PROCESSED! It is not a gwei issue it's totally geth fault and broadcast of transfers is happening only after geth restart. I have reported the issue 2 times ~ 1 year ago and it was told to me the issue was fixed (also other users reported same issue but in fact it was never fixed).

Issue is appearing random after some time and inside geth we don't have any information about transfer just txid which is not valid. Right now we have many transfers in pending state but it looks even after geth restart transfers are still not processed and can't get any information about them from explorer.

> eth.pendingTransactions [{ blockHash: null, blockNumber: null, from: "0x9f2e63c42623fcf34f854c78335a668c327dac48", gas: 21000, gasPrice: 140000000000, hash: "0xc641ec3e02d2b094e0e8b6f82897fce2953625ea88e769011d89c6a1fb80658a", input: "0x", nonce: 27, r: "0xcc52ed99ce1d3434ea90419926dba2da516ed6692c9a84843497eb289cb8b459", s: "0x1ebfbadd1383c246ab16215d52a3cefc4d678bfeb9940bba590968bf1a98222a", to: "0xd819b145004b4e88c2f9f814246800f65c4b1e9b", transactionIndex: null, v: "0x25", value: 113387710000000000 }, { blockHash: null, blockNumber: null, from: "0x9f2e63c42623fcf34f854c78335a668c327dac48", gas: 21000, gasPrice: 120000000000, hash: "0x6c84bbb044aa2aea668c841f3274f1b08ca6da4da699eefeec95467d6f4bef92", input: "0x", nonce: 28, r: "0xec0fa88cde01b05db0027091e1008814fe9e1b50a1511833baaa14d800d69378", s: "0x7f98553c04ddd58631eb0852e0dbee779b2b7aa0b97c1ec1bbf42d72565eeae4", to: "0x4f9637a095cb48ecfdea7d5abece9bf51bbc752b", transactionIndex: null, v: "0x26", value: 320793360000000000 }, { blockHash: null, blockNumber: null, from: "0x9f2e63c42623fcf34f854c78335a668c327dac48", gas: 33545, gasPrice: 133000000000, hash: "0x375a4fae37ad86fb1f3dbd9b1f886acc0b1655ceea0537940f696496735cb75b", input: "0x", nonce: 29, r: "0xaf54cff52cd8b0020d9858ee1c47d33ca919ffe1fae6bd655230d9b7b0b03f55", s: "0x342cb0eb881d8111b2dfb032d29a24239176e2ffdb1eeda1ea650aa67924a883", to: "0xd76becb1ae8588963515c52e8941f1b002471715", transactionIndex: null, v: "0x25", value: 7633337180000000000 }, { blockHash: null, blockNumber: null, from: "0x9f2e63c42623fcf34f854c78335a668c327dac48", gas: 21000, gasPrice: 155000000000, hash: "0xf3d1a44b11ac2c85e406214ac7f3cca6ff23d62967403816e9282fcee92fbff6", input: "0x", nonce: 30, r: "0x3ace40d0cf3103711f896f6d1618693848ae39a599d20cd6aff6ec3ce85d3590", s: "0x6f9cdc5c7a966aa27c0ae7eb4047e02aac8f9baad16a45bfaae11368c243ffec", to: "0x87a38ff8789409c5efd5c4686d0baea5661b38bd", transactionIndex: null, v: "0x25", value: 100000000000000000 }]

@matricore
Copy link

I agree with @Gbogdann93 . Its not related about gwei price. Most of my tx's are stucked in pending in node. Never shown on etherscan. stucked tx gwei is 150 and average gwei was 120 at the time creating.
May I learn whats the triggering scenario of re-broadcasting these transactions? Am i able to trigger them manually?

@bogdangainusa
Copy link
Author

This is totally geth issue which is pending for almost 1 year but never checked properly as there are multiple topics opened with same issue. I hope this time someone will try to cover and fix it.

@BigMurry
Copy link

BigMurry commented Feb 19, 2021

I can confirm this issue. Some of the transactions just stuck at geth with pending status, but the rest of the world not receive the pending transactions at all, no pending txs on etherscan. The gas price is correct and the nonce is correct. It seems that the geth just not broadcast these transactions at all. It may cost about 1.5 hours for geth to re-broadcast these transactions to the network and then these transactions confirmed by the network.
By the way, this happens very randomly, in our case it may happen once in 3~5 days.

@karalabe
Copy link
Member

What would help us is to know your sending patterns. How many transactions do you push into the network when this happens, how fast do you push them in, what is the txpool content at that time (txpool.inspect).

There are a lot of moving variables, and we're mostly confident Geth handles things correctly in general, but once you push transactions over the limits of your local and remote peer's capacities, things might get unreliable.

I'm 100% certain Geth published every transaction it gets. The question is why the those transactions get dropped by remote nodes to which you send it to. It hits some limits, the question is which one.

@karalabe
Copy link
Member

Certain entities are also playing pool battles, trying to kick each other out of the pool to get a better placing for exchange transactions. Such adversarial moves can also influence other transactions, though it's hard to say without seeing some traffic and txpool content when the issue happens.

@bogdangainusa
Copy link
Author

@karalabe It seems we're getting back all the time and blame this is not geth fault but in fact it totally is. In past year i have provided info, geth logs but nothing was checked at all and some threads were closed even if the resolution was "fixed in version 1.9.16".

image

We encountered same issue on multiple servers, we're running geth on at least 15 separate servers for our wallets and the issue is appearing also on server where we're sending 2-3 transfers/day and also on servers where we're sending more than 50 transfers out/day this is totally random and after some geth restart transfer are visible into network.

I think i have already tried to get all necessary informations for at least 2 topics but even if other users reported same problem, it wasn't never checked properly.

At this moment we have many customers complaining about transfers with invalid txid not available on network and we can't do anything at all. Do you have any temporary solution how to have this transfers available on network? ~ 2 years ago when we face same issue and we restarted geth, pending transfers from geth were executed 2 times and we lost some huge amount because of that. Please kindly let us know what we can do to have transfers properly sent out from geth. We have also prepared other servers with geth fully synced but i am not sure if we can move keystores and migrate to new one, and what will happen with pending transfers from old geth? We don't want to have transfers executed 2 times.

@holiman
Copy link
Contributor

holiman commented Feb 19, 2021

This can be further investigated, but it's not trivial.

If you are running multiple servers, can you: monitor one as being a. Have that node always connected to b. Have high logging on b. Send transactions via RPC on a. Next time this happens, check in the logs if the b node ever saw the transaction.

If it never saw the transaction: yes, geth failed to broadcast.
If it did see the transaction: you should be able to tell why the transaction was dropped by a peer (and most likely the rest of the network)

However, there are a lot of transactions coming in and getting dropped every second, so you might need to run some custom version of geth on b, e.g. with a mod that logs to a file whenever a transaction is dropped (hash + reason).

If you would be willing to set this up, I can provide the custom version.

@fjl
Copy link
Contributor

fjl commented Feb 19, 2021

It seems we're getting back all the time and blame this is not geth fault but in fact it totally is.

@Gbogdann93 what we're trying to say is: this might totally be geth's fault, but we cannot debug it without further information. I understand you are frustrated by this experience, since you have already provided logs in the past.

@bogdangainusa
Copy link
Author

@fjl Please let me know what is needed again from my side. I will not go and debug geth on live environment where we have at least 20 ETH in pending state. I am just developer as you and i don't have access to all servers and also we will not do any tests on environments with 200+ eth.

We already tried to replicate and see same output from test geth but issue can't be replicated at all. (same server, specs, geth version and so on)

@karalabe
Copy link
Member

@Gbogdann93 It's completely understandable that you don't want to debug/test on live environments, independent how much funds are at stake. Obviously don't test in prod.

Our suggestion was a bit different. We are mostly convinced that it's not your local node that's not sending out transactions, rather it's the nodes around you (the ones you're connected to) that are rejecting them after they receive them.

As such, you don't need to touch your production nodes at all really. What we need is a completely independent new node, that is simply connected to your production nodes and doesn't do anything special, just behaves like any other live node (with high verbosity logging). Martin suggested you set it up because you might feel better if no details about your infra are shared with us.

We can set up a node ourselves too with all the logging we'd need, the only thing we need is to be connected to your own nodes to see what's happening on the wire. We can whitelist your nodes to allow them to always connect to us if you give us enode URLs; though we'd probably need you to add our node as a connection to ensure there's a link when things go wonky. Then whenever you tell us a tx is dropped, we could dig up the events from out own logs to see what happened.

This bug is obviously a rare thing that I still say depends on network conditions. You won't be able to reproduce it easily, because there's no singular cause and wou won't be able to reproduce the network conditions. The only option we see is if we can somehow monitor the network traffic and analyze the events after-the-fact. So Option A: you set up an extra node within your infra, have it connected to your servers and send us logs; Option B: we set up our monitoring nodes and run it the best we can, but we still need to know we have a connection to your systems when weird things happen.

@bogdangainusa
Copy link
Author

@karalabe We would like to go with option B.
Can you please let us know exactly what is needed from our side?

@wspi
Copy link

wspi commented Feb 22, 2021

@karalabe we are facing the same issue for a while now, every few days we have pending txs on geth, sometimes dozens, at the moment we are rebroadcasting through other nodes we have or via etherscan.

if it helps troubleshoot we could also connect or node with yours and it would be a nice feature if the node that rejects txs, let the sender node known that it has been rejected and why

@tnodesgithub
Copy link

I met the same issue on my node, sendRawTransaction randomly stuck txs in local not broadcast to network.

INFO [02-23|03:37:13.849] Loaded local transaction journal transactions=99 dropped=0
INFO [02-23|03:37:13.849] Regenerated local transaction journal transactions=99 accounts=72
WARN [02-23|03:37:13.850] Switch sync mode from fast sync to full sync
INFO [02-23|03:37:13.852] Starting peer-to-peer node instance=Geth/v1.9.25-stable-e7872729/linux-amd64/go1.15.6

Only I can do is restart geth again and again, it will not rebroadcast all pending txs in one restart. You need to keep restarting again and again.

The stuck tx always happened when average gas price swing wildly.

@richiela
Copy link

This has been a problem for years with geth in large production networks. I can give everyone an the hack we have in place that keeps it happy most of the time, and while it may be "spammy" or a "waste of resources", it keeps our stuff running pretty smoothly (unless there is a backup, which I'll talk later)

a) set up 3-4 geth servers that are just nodes in multiple places if possible
b) set up a script that checks each of your servers for eth.pendingTransactions, getRawTransaction[x], sendRawTransaction(x) => all your servers in step a.
c) For bonus points, have it send to etherscans API to broadcast as well.

This will stop the random "a tx didn't go out and blocked my entire pipe" issue. It sucks, but this almost always gaurantees they are seen. We no longer have those stuck transactions with this setup.

The time this fails is large gas spikes and you have TXs that are stuck for multiple hours because of gas fees. When this happens, all the tx's you have out there are dropped, and all the endpoints you've sent them to already know about them too and have dropped them as well. The only way to fix that is basically find a new endpoint to rebroadcast the tx to.

As a service operator that relies on geth this entire thing is extremely frustrating.

@karalabe
Copy link
Member

Just an FYI https://gist.github.com/karalabe/821a1cd0270984a4198e904d34623b6c

@bogdangainusa
Copy link
Author

@karalabe Do we have any ETA for this issue? It will be fixed in the near future?
We setup new geth servers and issue is persistent even after geth restart we have pendingTransactions which are not broadcasted.

@karalabe
Copy link
Member

karalabe commented Mar 1, 2021

You'll need to give me a couple days, we're finalizing the 1.10 release now and I want to get that out of the way before diving into a new thing.

@bogdangainusa
Copy link
Author

@karalabe Please kindly lets us know when we can expect an update on this. Thank you.

@hamidbsd
Copy link

I just saw this issue on our node and based on the suggestions above, I restarted Geth and it got fixed.
One thing I noticed during the restart however was a few red errors messages saying something about not being able to connect to nodes Geth had been connected to before, something like they are not sync. (sorry i was in a rush so couldn't capture the messages) So I thing the problem might be when Geth is running for quite some time, it doesn't validate the health of the other nodes it is connected to and drop the bad/unsync ones properly so when it broadcasts a transaction it doesn't get relayed properly.

@andrej-hash
Copy link

@BigMurry thanks a lot for your suggestion to increase maxPeers value. it was working just fine for the last 2 weeks. I will be looking forward to your advice re the new geth upgrade and if any pending are shown...
thanks again!

@bogerv
Copy link

bogerv commented Apr 23, 2021

Is there any updates?
Now, we must restart the geth node if there is a withdrawal transaction every day.
Sometimes the estimated gas fee obtained is much smaller than online, resulting in many transactions that need to be resent.

@pmprete
Copy link

pmprete commented Apr 26, 2021

Same issue here with version 1.10.1. The node randomly stops broadcasting pending transactions, and the only way to fix it is restarting it

@kladkogex
Copy link

kladkogex commented Apr 29, 2021

It happened today on our network.

@bogdangainusa
Copy link
Author

bogdangainusa commented Apr 29, 2021

@kladkogex Get used to it. We have few servers where this error didn't appear.

What is really intresting is the fact that we have reported this issue for almost 2 years but it seems nobody cares even if this is very critical and affect many users transfers.

The only solution is to move to another client since at least we don't have any response in a place where we are supposed to help each others.

@kladkogex
Copy link

kladkogex commented Apr 29, 2021

Hey @karalabe

Let's make a bet - I do not know much about Go, never programmed this language. I have no clue what is inside geth. In addition to that I go for a vacation tomorrow, so you have 5 days lead time.

If I fix this bug faster than you, you pay me $1000. If you fix it faster than me, I pay you $1000.

Deal? This bug should have been fixed long time ago.

@JustinDrake @vbuterin

@holiman
Copy link
Contributor

holiman commented Apr 29, 2021

This ticket is titled "Geth not broadcasting transfer". There was another ticket, about not re-broadcasting transactions. Those are two different things.

Geth broadcasts all transactions. There are no known issues about broadcasting transactions. If geth indeed fails at doing so, then it should be possible to figure out a repro testcase to trigger such faulty behaviour. However, I think all these reports are really just an artefact of the next issue:

Geth does not always rebroadcast transactions. There's currently no fix for that, at least which won't cause a lot of network spam.

If I fix this bug faster than you,

Which bug is that? Number one or number two/

@Penait1
Copy link

Penait1 commented May 4, 2021

We got the same issue, especially since we upgraded to version 1.10.1. On the rinkeby network the tx's sometimes don't propagate. When I do eth.pendingtransactions() there aren't listed any tx's but they are not mined either. When I inspect the txpool I see my transactions with a valid gasprice (1 on Rinkeby) and correct nonces.

The strange thing; transactions from other addresses sent to the same node work fine. If we experience the issue only the transactions from that one particular address won't propagate. We sent tx's with different addresses to our node but the others continue to work normally.

More often than not it's after we've sent a few tx's in a short time span (multiple in less than 1 minute), but yesterday it was with a milder load and it still went wrong. We changed from a self hosted node to AWS Managed blockchain that uses Geth but it didn't make a difference, still the same issue.

The issue fixes after restart of our application/after some time randomly.

@bogdangainusa
Copy link
Author

@holiman Again we face this issue.
Inside eth.pendingTransactions of geth we have transfers which are ALREADY CONFIRMED INTO NETWORK but geth says it's pending.

Restarted geth but transfers are "still pending" even if we already have 100 confirmations for them.

I think this will never be fixed, am i right?

@holiman
Copy link
Contributor

holiman commented May 20, 2021

Original issue:

Transfer is in pending transaction state and even after geth restart transfer is not processed.

Now you say:

Again we face this issue.
Inside eth.pendingTransactions of geth we have transfers which are ALREADY CONFIRMED INTO NETWORK but geth says it's pending.

How is that the same issue? That sounds like a different issue to me -- it was obviously sent to the network, how could it otherwise be confirmed? Please file a new ticket.

@bogdangainusa
Copy link
Author

bogdangainusa commented May 20, 2021

@holiman I can open 100 tickets but in the end is just waste of time as we all can see.

There are 2 issues which have been reported by me and other users for almost 2 years and no ones cares.

  1. Transfers are in pending state inside geth for long time, sometimes geth restart help to broadcast them sometimes not. ( we must restart geth multiple times to have them broadcasted to network)
  2. Transfers are displayed inside geth as pending but they are already confirmed with mroe than 100 confirmation blocks. Tried to restart geth but transfers are still in pending inside geth. (please note geth is syncing - block is up to date)

Don't get me wrong, we're all tired about this issues coming all the time and it's not me reporting this, there are plenty of other users which are complaining about this bug and in the end no action is done from your side. I can go again from scratch to open ticket and post again all informations required, logs and other things but as i see it's just waste of time from my side.

@karalabe
Copy link
Member

@Gbogdann93 I'm sorry, we're a handful of people trying to maintain the client. People are pushing for new transaction models (1559), people are pushing for the merge. We don't have capacity to handle everything. We are aware of issues, in our opinion the solution is a full pool rewrite with a completely different data model. That will take a long time.

If you have capacity to debug the pool, great, we'd gladly fix any explicit errors.

@AmitBRD
Copy link
Contributor

AmitBRD commented May 21, 2021

@holiman I can open 100 tickets but in the end is just waste of time as we all can see.

There are 2 issues which have been reported by me and other users for almost 2 years and no ones cares.

  1. Transfers are in pending state inside geth for long time, sometimes geth restart help to broadcast them sometimes not. ( we must restart geth multiple times to have them broadcasted to network)
  2. Transfers are displayed inside geth as pending but they are already confirmed with mroe than 100 confirmation blocks. Tried to restart geth but transfers are still in pending inside geth. (please note geth is syncing - block is up to date)

Don't get me wrong, we're all tired about this issues coming all the time and it's not me reporting this, there are plenty of other users which are complaining about this bug and in the end no action is done from your side. I can go again from scratch to open ticket and post again all informations required, logs and other things but as i see it's just waste of time from my side.

Can we avoid the hostility and negativity towards the maintaining team? We are capable of contributing to the project so lets be civil and appreciate any help we can provide each other

@yyj-zz
Copy link

yyj-zz commented Jun 11, 2021

we are experincing the same issue ,and i mentted it here

@bogdangainusa
Copy link
Author

Same issue again with latest geth version v.10.6.
We are already tired of this issue...

@chreechris
Copy link

Same issue here. Been following this conversation thread for some time in hope that someone, somewhere finds something to alleviate this problem.

I am running a geth server in docker on a high powered aws instance. We have the data vol on a dedicated ebs volume using GP2. We are investigating whether disk congestion might correlate with this issue - we have seen this issue occur when we tend to see higher io waits on our data disk.

We have only just noticed this coincidence so it may turn out to be nothing but given this problem happens every few days for us we shoukd be able to conclude this theory soon. We are now running enhanced metrics and stats collection to see if there is alignment between the two. We should know more within a week or two.

Keep you posted.

@chreechris
Copy link

Update - we believe we have got to the bottom of this issue, although still tracking for confidence.
This is not related to disk congestion however the spikes did highlight other activity which when observed through the Prometheus metrics showed some interesting stuff.

We had previously upped our GlobalSlots to 250k, but not touched the other parameters. Having eyed the metrics closer:-

  1. We were using the default setting for GlobalQueue (1024) - this was maxing out at 1024 indicating no further queue slots available. This was increased to 50k and we saw a healthy queue grow and shrink but never reach the max.
  2. We were using the default setting for AccountSlots (16) - For some of our accounts we were sending multiple transactions that exceeded 16. This was increased to 64.

Without these settings above increased, when we had a flurry of transactions to send, the earlier transactions were published fine but the node held back the later ones until it was restarted.

With these two settings adjusted/increased we have seen consistent behaviour in transactions being sent to the blockchain. This may of course be purely coincidental given the behaviour and volatility we see on Ethereum, but two weeks without a stuck transaction we have already hailed this as a minor miracle.

@karalabe - It would be great if we could confirm the above - would it be possible to investigate these conditions and what happens to locally submitted transactions when these two parameters are exceeded? For the second parameter you will need to batch up a series of transactions from the same account and submit them at once. FYI - it would be great to add some additional logging around these parameters to know when they have been exceeded.

@Penait1
Copy link

Penait1 commented Oct 27, 2021

@chreechris Are you still running without issue with those parameters?

@leontastic
Copy link

I've been seeing this issue regularly for months. Seems to be no way to gracefully resolve it other than killing GETH and praying that the little gremlins inside GETH persisted all the transactions to disk and will rebroadcast them on restart.

@leontastic
Copy link

What might help is if we had some separate logs for Broadcasted transaction to peer in addition to the Submitted transaction hash logs.

@jbriaux
Copy link

jbriaux commented Nov 9, 2021

I do suffer from same issue time to time. As @chreechris mentionned, it only happens when blockchain is under heavy load. Another workaround si to cancel (nonce overwrite) the first tx stucked then all other TX in queue will be broadcasted. I will test the settings globalqueue and accountslot as well and update in a few weeks.

@leontastic
Copy link

This is happening on almost a daily basis for me now. GETH logs don't reflect any errors or problems broadcasting the transaction. The GETH node just silently hoards the transactions, and won't re-broadcast them until I restart it and submit another transaction.

Is this a problem with the peers my node is broadcasting to? Or is there something that prevents GETH from broadcasting to its peers correctly? If GETH cannot confirm the transaction in a reasonable number of blocks, shouldn't GETH try to rebroadcast the transaction repeatedly until I issue a new transaction to drop+replace with a higher fee?

@jbriaux
Copy link

jbriaux commented Dec 17, 2021

there is a fix on BSC for this. Is it possible to port it to go-ethereum ? bnb-chain/bsc#570

@leontastic
Copy link

Shouldn't GETH rebroadcast the transactions to newly connected peers too? I'm finding GETH doesn't even retry transactions broadcasted to peers that get dropped after.

It seems like the GETH broadcasting logic assumes that all the peers at the moment of broadcast are good peers, which is probably a faulty assumption... some of these peers are bad.

@pschork
Copy link

pschork commented Dec 29, 2021

There should be a better way to force rebroadcast the pending transaction queue other than restarting geth.

@BlockChainCaffe
Copy link

Hi there.
I'm having exactly the same problem @bogdangainusa is reporting
get runs, chain is updated, transactions are signed, with the right nonce and fee.
The problem is tx are not published to the network.
Need to restart geth, or worst resync it and re-issue the tx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests