Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comms vat killed by unexpected recv seqNum #3021

Open
warner opened this issue May 4, 2021 · 4 comments
Open

comms vat killed by unexpected recv seqNum #3021

warner opened this issue May 4, 2021 · 4 comments
Assignees
Labels
bug Something isn't working SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented May 4, 2021

Describe the bug

While testing, I saw the comms vat get terminated when it received an unexpected seqnum. The slog trace was:

{"time":1620070469.4858224,"type":"deliver","crankNum":65210,"vatID":"v12","deliveryNum":14612,"kd":["message","ko464",{"method":"receive","args":{"body":"[\"8158:6811:deliver:ro+73:makeIn
vitation:rp-8197;[]\"]","slots":[]},"result":"kp24984"}],"vd":["message","o+30",{"method":"receive","args":{"body":"[\"8158:6811:deliver:ro+73:makeInvitation:rp-8197;[]\"]","slots":[]},"re
sult":"p-8218"}]}
{"time":1620070469.4860346,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":0,"ksc":["vatstoreGet","v12","r.o+30"],"vsc":["vatstoreGet","r.o+30"]}
{"time":1620070469.486178,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":0,"ksr":["ok","r1"],"vsr":["ok","r1"]}
{"time":1620070469.4863627,"type":"clist","crankNum":65210,"mode":"drop","vatID":"v12","kobj":"kp24984","vobj":"p-8218"}
{"time":1620070469.486449,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":1,"ksc":["resolve","v12",[["kp24984",false,{"body":"{\"@qclass\":\"undefined\"}",
"slots":[]}]]],"vsc":["resolve",[["p-8218",false,{"body":"{\"@qclass\":\"undefined\"}","slots":[]}]]]}
{"time":1620070469.487023,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":1,"ksr":["ok",null],"vsr":["ok",null]}
{"time":1620070469.4871848,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":2,"ksc":["vatstoreGet","v12","r1.initialized"],"vsc":["vatstoreGet","r1.initiali
zed"]}
{"time":1620070469.4873176,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":2,"ksr":["ok","true"],"vsr":["ok","true"]}
{"time":1620070469.4877985,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":3,"ksc":["vatstoreGet","v12","r1.recvSeq"],"vsc":["vatstoreGet","r1.recvSeq"]}
{"time":1620070469.488014,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":3,"ksr":["ok","8156"],"vsr":["ok","8156"]}
{"time":1620070469.4881413,"type":"syscall","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":4,"ksc":["vatstoreSet","v12","r1.recvSeq","8157"],"vsc":["vatstoreSet","r1.recvS
eq","8157"]}
{"time":1620070469.4882417,"type":"syscall-result","crankNum":65210,"vatID":"v12","deliveryNum":14613,"syscallNum":4,"ksr":["ok",null],"vsr":["ok",null]}
{"time":1620070469.489727,"type":"deliver-result","crankNum":65210,"vatID":"v12","deliveryNum":14612,"dr":["error","unexpected recv seqNum (a string)",null]}
{"time":1620070470.2255008,"type":"terminate","vatID":"v12","shouldReject":true,"info":{"body":"{\"@qclass\":\"error\",\"name\":\"Error\",\"message\":\"unexpected recv seqNum (a string)\"}","slots":[]}}

I'm pretty sure it's happening here:

const recvSeqNum = remote.advanceReceivedSeqNum();
assert(
seqNum === '' || seqNum === `${recvSeqNum}`,
X`unexpected recv seqNum ${seqNum}`,
);

I'm not exactly sure why this caused the comms vat to terminate, but I think it's because we decided that any unhandled error from dispatch qualifies as an internal error and kills the vat.

The tasks are:

  • Make sure externally-provokable unexpected situations (protocol errors) don't kill the whole comms vat. At this stage it's probably appropriate to ignore the message, but it might also be reasonable to mark that one remote as dead.
    • I think this happened when my ag-solo client got confused and re-sent the same message, possibly with a higher sequence number. I don't yet know whether this was invalid behavior on part of the client: I need to look at the earlier slog entries to see what message it sent before. vattp is supposed to filter out duplicates, so maybe that broke down somehow
  • If the comms vat does suffer a fatal error (like a broken invariant), we should probably halt the kernel instead of killing off the comms vat. This might suggest we need a way to mark some vats (maybe just all static vats) as precious, as distinct from dynamic vats whose termination can be survived.
@warner warner added bug Something isn't working SwingSet package: SwingSet labels May 4, 2021
@warner warner added this to the Testnet: Staking Dynamics milestone May 4, 2021
@warner
Copy link
Member Author

warner commented May 4, 2021

We put a bandage on this in #3024, but we need to make a thorough check of all the assertions and make sure they kill only the appropriate level.

@dckc
Copy link
Member

dckc commented May 4, 2021

Is there more to do on this for today's release? If not, I'd like to postpone it to a later milestone.

@warner
Copy link
Member Author

warner commented May 5, 2021

I don't think there's more to do on this for the Staking Dynamics milestone. We'll just have to see if the #3024 bandage makes a difference.

@Tartuffo Tartuffo added the MN-2 label Jan 20, 2022
@warner warner added MN-1 and removed MN-2 labels Jan 25, 2022
@Tartuffo
Copy link
Contributor

We may have fixed the initial error, but what should we do if it DOES happen again?

@Tartuffo Tartuffo removed the MN-1 label Feb 7, 2022
@Tartuffo Tartuffo added this to the Mainnet 1 milestone Mar 23, 2022
@Tartuffo Tartuffo modified the milestones: Mainnet 1, RUN Protocol RC0 Apr 5, 2022
@Tartuffo Tartuffo removed this from the Mainnet 1 milestone May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

5 participants