-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[web3.js] Infinite recursion caused by _updateSubscriptions #1106
[web3.js] Infinite recursion caused by _updateSubscriptions #1106
Comments
Thanks for this report! Other than spamming the error console, does this interfere with the operation of your app? |
Of course. It caused infinite recursion and made CPU usage soar to 100%, so that my app cannot run as normal. @steveluscher |
I was also seeing this issue on v1.47.3 and reverted to v1.43.5 and it went away. |
Totally, I get it, @Disperito. Unbounded recursion is super bad news, no matter which way you slice it. What I was trying to get at is that there are two kinds of infinite recursion. One leads to the JavaScript thread literally locking, and freezing the browser:
The other results in code running as often as it can, and high CPU usage, but doesn't lock the thread; other code can continue to run in the period between yields.
In any case, we need a general solution for this. |
Can I ask you a question? Can you count the number of subscriptions that you're making to the RPC in total? I've seen situations where RPCs limit the number of open subscriptions you may hold. When you exceed a certain number, they murder your connection abruptly. See solana-labs/solana#25322 (comment). |
Sorry I have no definite answer. But I have reviewed the logs near when this error occurs, I think there were only less than 10 subscriptions at the same time, even less. I don't think the large number of subscriptions is the root cause of the bug. It is true that a large number of subscriptions can lead to poor connnection with RPCs, but the point is: your library should recognize this situation and do special handling, such as throwing an exception, or actively unsubscribing |
Still getting this error with @solana/[email protected] @steveluscher. Spammed logs until server was killed.
|
@Jac0xb If you have a reliable repro of this, can you debug it a bit to see if this is the root cause? solana-labs/solana#25578 |
Not sure the repro steps but happens often with sendConfirmation and possibly log lived connection objects. Haven't seen it before. Happened in Node.js. Looks like there is an onSignature call in there |
To anyone who's still experiencing this problem, can you try to upgrade to an experimental release, by running Let me know if that fixes anything. |
@Jac0xb, @Disperito, @gallynaut do you have time to give this another try, if you still have a repro? See the message above. |
I am seeing this on latest 1.73.2. Anything that I can do to help resolve this? It's random and happens frequently after some transactions are made
Update: solana-web3.js/packages/library-legacy/src/connection.ts Lines 5990 to 5992 in 67f3bf8
There's definitely something strange going on here, because upon debugging uncaught exceptions, Looks like this:
from the PR description might still be happening. |
I have left some additional comments on the PR solana-labs/solana#29195 to further continue discussion. I hope with my unfortunate stable reproduction we can get this fixed. |
Update: this might be false positive, but after reading this issue again, one thing from OP caught my immediate attention. We both execute a series of transactions, one after another. In my case, I have an array and for each item in an array, I execute 3 transactions. I will eventually refactor to a single one (and batch those calls), but I think that could be highly related. I added a setTimeout of 15 seconds between each call (just to stay on the safe side) and my script executed successfully. Are we dealing with a race condition here? Update: unfortunately, it errored again, but I have a feeling the error happens less frequently. |
Update: I have managed to find a fix and posted in the PR solana-labs/solana#29195 (comment). Hope we can land this quick! |
I'm having the same issue. Application is getting rekt by subscriptions. |
@dfl-zeke, do you have a repro that you can share? I'm happy to check out a repository and reproduce the problem over here. |
I don't have it, but I can suggest how you can reproduce it. Subscribe to 5K random accounts via Websocket and you will get this error surely. I was testing how much subscriptions I could handle but was unable to test, as sometimes it starts throwing errors from 2K subs, sometimes from 14K so it's kind of random. I've even added sleeps between chunk subs and it did not help at all, same error. If you try to subscribe all of them at the same time, it will start throwing this error immediately. I even tried to have multiple socket connections, but still got this error. Also, while I'm here and you might see this, I would suggest to add retry limits on websocket (because now it always retry to reconnect in case of fail) and ping turn on/off, it would made this package much more comfortable to work with. |
I see! Can you drum me up a quick repro of exactly that and post it on GitHub, @dfl-zeke? Also, let me know exactly which RPC URL you were able to reproduce the problem with because, as noted here, different RPCs have different subscription limits, and take different actions when you overflow that limit. I want to avoid a situation where I try to stand up a repro, can't trigger the bug, and have to come back to you for more information1. Footnotes
|
Will try to create one. I'm using rpcpool endpoint, so I can't share it with u publicly at least. Also, if I get limited, I should get 429 (which I got sometimes to be honest). I'll share quicknode one with free tier subscription. (I have same problem with them too). |
Here you go: https://github.com/dfl-zeke/solana-web3js-subs . If you have any questions, ping me. |
How long do you have to leave this running before you see the problem, @dfl-zeke? I'm at index 8300 and haven't seen it yet. |
Nevermind. There it is. |
Here's what I discovered about your repro: First, you're subscribing to the same RPC node 100 times, and fanning out subscriptions to each one of those connections. RPCs really hate when you do this, and will start to limit you and/or drop your connection (with a Now, let's talk about the actual problem that results when the RPC does unceremoniously drop you. This code is correctly thowing an exception that results when a message heads for a socket that's actually in a bad (non solana-web3.js/packages/library-legacy/src/rpc-websocket.ts Lines 49 to 60 in 33429ca
And this code is catching it in the subscriptions state machine: solana-web3.js/packages/library-legacy/src/connection.ts Lines 6001 to 6018 in 33429ca
Now, when the socket goes down like this, two things happen: its
Some possible solutions:
I have a few half-baked implementations of this on the go. Give me some time to find the best one. |
I'm pretty sure that elpheria/rpc-websockets#138 fixes the source of corruption of the subscriptions state machine here. |
@steveluscher I understand, that 20K subscription is not a normal thing to have, but I get it on 500 subscriptions sometimes and it's annoying. If I have too many subs, then okay, I can understand that host can limit me and give me error on the call, but do not crash the program. |
Yeah, I doubt it has anything to do with the number of subscriptions. It probably has everything to do with which box the load balancer shunts you to, and how many open subscriptions it currently has. |
To me, it happens pretty much all the time I try to wait for a transaction results. I think it has to do with React Native and the way it handles web sockets. I still didn't have time to provide a reproduction because for me, the workaround I pasted here temporarily fixes the problem (but introduces other side effects, so not really production ready lol). I will try to provide a reproduction soon, but maybe that helps too. I see you also posted another PR, so I will wait till its shipped to double check. Meanwhile, @steveluscher, how do you feel about solana-labs/solana#29195 in general? If that didn't fix the original error, do you still consider it a valid improvement or you'd say it's something that could be reverted? I left some comments here: solana-labs/solana#29195 (comment) |
🎉 This issue has been resolved in version 1.73.3 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Try |
@dfl-zeke, I upgraded your repro repo to 1.73.3 and can no longer kill it. |
Because there has been no activity on this issue for 7 days since it was closed, it has been automatically locked. Please open a new issue if it requires a follow up. |
Problem
I'm using the version 1.44.1 of @solana/web3.js, to develop an application with high-frequency calling.
And the connection between network on my machine and Solana' Network seems to be not so stable: it's often to throw error about transaction confirmation timeout
In this situation, it suddenly to fastly and infinitely throw errors like follow:
Here is the key log:
As you can see, the connection is frantically trying to build up subscriptions on TWO transaction signatures.
Then I found out, the problem was caused in function _updateSubscriptions() in connection.ts:
Proposed Solution
Maybe you team should fix the logic of error handling. Here's my humble opinion: if there are some subsciptions failing for too many times(whatever the reason is), they should be directly removed from subscriptions list.
The text was updated successfully, but these errors were encountered: