Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC write txs stopped working while read txs work #29691

Closed
codemonkey6969 opened this issue Jan 13, 2023 · 3 comments
Closed

RPC write txs stopped working while read txs work #29691

codemonkey6969 opened this issue Jan 13, 2023 · 3 comments
Labels
community Community contribution stale [bot only] Added to stale content; results in auto-close after a week.

Comments

@codemonkey6969
Copy link
Contributor

Randomly, all write txs on our rpc node stopped working but reads were working perfectly. This has happened for a 2nd time now so seems like a bug. Have logs for both times.

Log first time: https://snapshots.nodemonkey.io/snapshots/validator_write.log
Log second time: https://snapshots.nodemonkey.io/snapshots/validator_write2.log

@codemonkey6969 codemonkey6969 added the community Community contribution label Jan 13, 2023
@codemonkey6969
Copy link
Contributor Author

Today it stopped working for about 30 minutes then started sending write txs again 30 minutes later. here are the logs in order of it happening: https://snapshots.nodemonkey.io/snapshots/validator_writeissuebefore.log and then https://snapshots.nodemonkey.io/snapshots/validator_writeissue.log

@steviez
Copy link
Contributor

steviez commented Jan 25, 2023

A few notes:

  • For the distinction of reads v. writes, specific RPC calls would be helpful to try to qualify the workload your node(s) experienced
  • Having a rough time for when the issue occurred is helpful. Ie absolute date/time in UTC
  • Related to above, some of the logs you loaded up were pretty big (think I saw one at ~45 GB) ... this is a lot for someone to sift through if they're not exactly sure what/when they're looking for.
    • I didn't scan your full logs so maybe you already did this, but trimming the logs down to time range around incident make things a little less cumbersome
  • Version / host_id (if submitting metrics) would be useful to as well

I recently saw another issue of someone having RPC issues (#29902). They have already done some of the narrowing down, so if that seems like the same issue, I'm inclined to move conversation there to avoid split and/or duplicate conversations

@codemonkey6969
Copy link
Contributor Author

A few notes:

  • For the distinction of reads v. writes, specific RPC calls would be helpful to try to qualify the workload your node(s) experienced

  • Having a rough time for when the issue occurred is helpful. Ie absolute date/time in UTC

  • Related to above, some of the logs you loaded up were pretty big (think I saw one at ~45 GB) ... this is a lot for someone to sift through if they're not exactly sure what/when they're looking for.

    • I didn't scan your full logs so maybe you already did this, but trimming the logs down to time range around incident make things a little less cumbersome
  • Version / host_id (if submitting metrics) would be useful to as well

I recently saw another issue of someone having RPC issues (#29902). They have already done some of the narrowing down, so if that seems like the same issue, I'm inclined to move conversation there to avoid split and/or duplicate conversations

Hey @steviez, that issue is different. We have also experienced that but luckily haven't had any customers trigger the issue since. This should probably stay open as a separate issue because when this occurs, read TXs work properly. When the other issue occurs, nothing works and no RPC calls are responsive. I will try and get you exact times as well for these logs to assist in narrowing it down. Thanks!

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 26, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

No branches or pull requests

2 participants