Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ› Bug Report β€” workerd sometimes doesn't send Debugger.scriptParsed events #1201

Closed
mrbbot opened this issue Sep 19, 2023 · 11 comments
Assignees

Comments

@mrbbot
Copy link
Contributor

mrbbot commented Sep 19, 2023

Hey! πŸ‘‹ When enabling the inspector with the hello world ESM sample, workerd can start up in an invalid state where it never responds to Debugger.enable commands with Debugger.scriptParsed events.

Using the following script and [email protected] or [email protected]...

// client.mjs
import { WebSocket } from "ws";

const ws = new WebSocket("ws://127.0.0.1:9229/main");

ws.addEventListener("message", (event) => {
   console.log(JSON.parse(event.data));
});
ws.addEventListener("open", (event) => {
    ws.send(JSON.stringify({ id: 0, method: "Debugger.enable", params: {} }));
    setTimeout(() => {
        ws.close();
    }, 3000);
});
$ npx [email protected] serve samples/helloworld_esm/config.capnp --verbose --inspector-addr=127.0.0.1:9229
workerd/server/server.c++:2622: info: Inspector is listening
workerd/server/server.c++:1099: info: Inspector client attaching [main]
workerd/server/server.c++:1109: info: Inspector client detached [(οΏ½οΏ½]
$ npm install ws && node client.mjs
// Expect `Debugger.scriptParsed` events here
{
  id: 0,
  result: { debuggerId: '-7930107370981718434.8415374541215293299' }
}

You may need to re-run npx workerd@... serve to get it into an invalid state. I haven't been able to reproduce this with [email protected]. Also note the garbage in the inspector detach logs [(οΏ½οΏ½].

I haven't been able to narrow this down to a particular change yet as my local workerd builds are currently broken because of an automatic Xcode update. I'm also going OOO for the next 7 days.

@mrbbot mrbbot changed the title πŸ› Bug Report β€” workerd sometimes doesn't send with Debugger.scriptParsed events πŸ› Bug Report β€” workerd sometimes doesn't with Debugger.scriptParsed events Sep 19, 2023
@mrbbot mrbbot changed the title πŸ› Bug Report β€” workerd sometimes doesn't with Debugger.scriptParsed events πŸ› Bug Report β€” workerd sometimes doesn't send Debugger.scriptParsed events Sep 19, 2023
@ohodson ohodson self-assigned this Sep 19, 2023
@ohodson
Copy link
Contributor

ohodson commented Sep 20, 2023

I can repro this with-or-without the change in the initial execution context id that workerd added to v8.

The repro is fairly easy: launch workerd with the helloworld sample (the default debug target in VSCode for workerd with inspector enabled). Simply connecting and disconnecting inspector clients from Chrome shows roughly 1 in 3 not getting the Runtime.executionContextCreated and Debugger.scriptParsed messages.

I also came across a Smi initialization exception with the initial execution context id randomization. This is an easy to fix we should make, but not relevant to the problem at hand.

@ohodson
Copy link
Contributor

ohodson commented Sep 20, 2023

After adding instrumentation, it looks like the embedded V8 is creating the messages, they are getting queued and sent onwards by the inspector I/O thread. The messages are sent from workerd to the client via WebSocket::send() and WebSocket::send() is being called for each message.

workerd/io/worker.c++:2349: info: Queuing message; message = {"method":"Debugger.scriptParsed","params":{"scriptId":"3","url":"main","startLine":0,"startColumn":0,"endLine":11,"endColumn":0,"executionContextId":2,"hash":"8fedb878609fff797f5ff6786bccee122c6cf20486f1360e18e03a62d79b4fda","isLiveEdit":false,"sourceMapURL":"","hasSourceURL":false,"isModule":false,"length":330,"scriptLanguage":"JavaScript","embedderName":"main"}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":2,"result":{"debuggerId":"6072217653684361910.727000825402291285"}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":3,"result":{}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":4,"result":{}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":5,"method":"Profiler.enable","params":{}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":6,"result":{}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":7,"result":{}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"method":"Runtime.executionContextCreated","params":{"context":{"id":2,"origin":"","name":"Worker","uniqueId":"8842189848001747465.1261715464375696085"}}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":1,"result":{}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"method":"Debugger.scriptParsed","params":{"scriptId":"3","url":"main","startLine":0,"startColumn":0,"endLine":11,"endColumn":0,"executionContextId":2,"hash":"8fedb878609fff797f5ff6786bccee122c6cf20486f1360e18e03a62d79b4fda","isLiveEdit":false,"sourceMapURL":"","hasSourceURL":false,"isModule":false,"length":330,"scriptLanguage":"JavaScript","embedderName":"main"}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":2,"result":{"debuggerId":"6072217653684361910.727000825402291285"}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":3,"result":{}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":4,"result":{}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":5,"method":"Profiler.enable","params":{}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":6,"result":{}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":7,"result":{}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":8,"result":{"scriptSource":"// Copyright (c) 2017-2023 Cloudflare, Inc.\n// Licensed under the Apache 2.0 license found in the LICENSE file or at:\n//     https://opensource.org/licenses/Apache-2.0\n\naddEventListener('fetch', event => {\n  event.respondWith(handle(event.request));\n});\n\nasync function handle(request) {\n  return new Response(\"Hello World\\n\");\n}\n"}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":9,"result":{"breakpointId":"1:9:2:main","locations":[{"scriptId":"3","lineNumber":9,"columnNumber":2}]}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":8,"result":{"scriptSource":"// Copyright (c) 2017-2023 Cloudflare, Inc.\n// Licensed under the Apache 2.0 license found in the LICENSE file or at:\n//     https://opensource.org/licenses/Apache-2.0\n\naddEventListener('fetch', event => {\n  event.respondWith(handle(event.request));\n});\n\nasync function handle(request) {\n  return new Response(\"Hello World\\n\");\n}\n"}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":9,"result":{"breakpointId":"1:9:2:main","locations":[{"scriptId":"3","lineNumber":9,"columnNumber":2}]}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":10,"result":{"locations":[{"scriptId":"3","lineNumber":9,"columnNumber":2},{"scriptId":"3","lineNumber":9,"columnNumber":9,"type":"call"}]}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":10,"result":{"locations":[{"scriptId":"3","lineNumber":9,"columnNumber":2},{"scriptId":"3","lineNumber":9,"columnNumber":9,"type":"call"}]}}
workerd/io/worker.c++:2349: info: Queuing message; message = {"id":11,"result":{"locations":[{"scriptId":"3","lineNumber":9,"columnNumber":2},{"scriptId":"3","lineNumber":9,"columnNumber":9,"type":"call"}]}}
workerd/io/worker.c++:2449: info: sendToWebSocket; message = {"id":11,"result":{"locations":[{"scriptId":"3","lineNumber":9,"columnNumber":2},{"scriptId":"3","lineNumber":9,"columnNumber":9,"type":"call"}]}}

Running wireshark and filtering on websocket traffic, it looks like workerd is sending the same initial messages and getting responses back for them, but devtools is not showing the early messages in the protocol debugger. TBC.

ohodson added a commit that referenced this issue Sep 21, 2023
Avoid a Smi range exception when picking a random initial
executionContextId.

Not a fix for #1201, but discovered looking into the problem.

Bug: πŸ› Bug Report β€” `workerd` sometimes doesn't send `Debugger.scriptParsed` events #1201
ohodson added a commit that referenced this issue Sep 21, 2023
Avoid a Smi range exception when picking a random initial
executionContextId.

Not a fix for #1201, but discovered looking into the problem.

Bug: πŸ› Bug Report β€” `workerd` sometimes doesn't send `Debugger.scriptParsed` events #1201
@ohodson
Copy link
Contributor

ohodson commented Sep 21, 2023

Bisecting, looking for an LKG:

Commit (most recent first) Description Repro?
3f8ecc8 V8 update to 11.7 Yes, plus Smi exception (macOs)
772bf8d randomize id change No, using with @RamIdeas script (linux)
5166892 Commit before 772bf8d No, using with @RamIdeas script (linux)
6b19807 Update V8 to 11.6* No, using with @RamIdeas script (linux)

The script seems to be too fast, just as a sanity check tried 772bf8d with the script and with Chrome's devtools. Issue appears to be reproing with cdt, but not the script. From WireShark experiments yesterday, it looks like the CDP messages are being sent, but the first few are not always picked up by the protocol inspector.

* fails to build macOS, possibly due to xcode build tools update. Switched to linux i9-13900.

@ohodson
Copy link
Contributor

ohodson commented Sep 21, 2023

Take 2, just linux and Chrome

Commit (most recent first) Description Repro?
6b19807 Update V8 to 11.6 Wireshark shows expected outbound messages and responses, no CDP in message in Protocol inspector in local Chrome :-/

ohodson added a commit that referenced this issue Sep 21, 2023
Avoid a Smi range exception when picking a random initial
executionContextId.

Not a fix for #1201, but discovered looking into the problem.

Bug: πŸ› Bug Report β€” `workerd` sometimes doesn't send `Debugger.scriptParsed` events #1201
ohodson added a commit that referenced this issue Sep 21, 2023
This change adds a 300ms pause before workerd starts sending out
messages when a CDP client attaches. Some clients seem to lose track
of the first few messages if they are sent right away.

Bug: #1201
Test: manual
@ohodson
Copy link
Contributor

ohodson commented Sep 21, 2023

Adding a pause before sending messages appears to enable Chrome to render the CDP messages it receives in the protocol view window. tcpdump shows those same messages arriving and being responded to if there is no delay, but the UX does not render it.

There might a better fix available, TBD.

ohodson added a commit that referenced this issue Sep 21, 2023
This change adds a 300ms pause before workerd starts sending out
messages when a CDP client attaches. Some clients seem to lose track
of the first few messages if they are sent right away.

Bug: #1201
Test: manual
@ohodson
Copy link
Contributor

ohodson commented Sep 21, 2023

Overall, it'd be good to have another fix here, but this issue seems to be pressing this week.

On both the local M1 MBP and a x64 Linux box, 250ms seems to eliminate any issues when testing workerd/samples/helloworld/config.capnp against Chrome's protocol inspector (and what's observed in the developer UX). 300ms is in the PR as it adds some margin for safety. 50ms, 100ms, 150ms, 200ms all exhibited some degree of flakiness on both test boxes.

The initial messages from the client are Runtime.enable and Debugger.enable. Possibly the outbound messages, like Debugger.scriptParsed are being sent before the responses to the initial incoming messages and this is the actual problem. TBD.

ohodson added a commit that referenced this issue Sep 21, 2023
This change adds a 300ms pause before workerd starts sending out
messages when a CDP client attaches. Some clients seem to lose track
of the first few messages if they are sent right away.

Bug: #1201
Test: manual
@ohodson
Copy link
Contributor

ohodson commented Sep 22, 2023

Looking at the CDP, it looks like the initial messages are generated appropriately.

  1. Client sends Runtime.enable, this enables the reporting of execution context creation by Runtime.executionContextCreated events.
  2. Client sends Debugger.enable and this generates Debugger.scriptParsed events. The wording for this in the description of Debugger.scriptParsed: ... This event is also fired for all known and uncollected scripts upon enabling debugger.

In wrangler-devtools, It looks like scriptParsed events are ignored if the debugger is not enabled (at least in DebuggerModel.ts). Does the debugger get enabled when the request is sent from devtools or when the response is received?

I'm not seeing any similar gating for executionContextCreated events in RuntimeModel.ts (and not sure I'm looking in the right place).

@ohodson
Copy link
Contributor

ohodson commented Sep 22, 2023

Reset on this issue.

With an opt workerd build at tag v1.20230918.0, the problem as described in the initial report here repros on my MBP.

Repro steps:

  • workerd launched from vscode in workerd with inspector enabled (opt) with the default sample ${workspaceFolder}/samples/helloworld/config.capnp
  • node client.mjs

Observations:

  • Zero successes from invoking the client script.
  • Adding a KJ_LOG on the receive path and the issue goes away.
  • No repro with a dbg build.

Follow-on:

  • The issue didn't repro at ToT with-or-without yesterday's change (ad540fb nor 07c50e3).
  • Going back to tag v1.20230918.0, rebuilding opt workerd, and no repro.
  • Try bazel clean --expunge, rebuild opt, no repro.

Interesting.

@ohodson
Copy link
Contributor

ohodson commented Sep 22, 2023

πŸ’‘Forcing the smi issue fixed in #1217 with an opt build and this issue appears to repro 100% at v1.20230918.0.

  • There is no output from workerd.
  • The executionContextId is bumped by 1 each time a client connects, but if the initial choice was bad, no client is likely to get a response.
  • The smi issue shows up in a debug only check (V8_Dcheck).
  • The issue is only manifesting based on the initial RNG so is flakey.
  • It's so far unclear where execution is going when the issue is repro'ed, even with a dbg build, there is no further log output, seemingly no exceptions.

@ohodson ohodson closed this as completed Sep 22, 2023
@RamIdeas
Copy link
Contributor

@ohodson am I inferring correctly that:

[1] this bug was caused by the RNG of executionContextId (but we're not just not clear why), and
[2] the workaround to wait 300ms before sending CDP messages is sufficient for now so users won't be affected by the bug

I'm sure what I've inferred is probably wrong but I'm just seeking clarity on whether we're happy to release workerd in it's current state or what the latest build we're comfortable releasing is

@ohodson
Copy link
Contributor

ohodson commented Sep 22, 2023

Just to follow up:

  1. the RNG bug for executionContextId happened because the 32-bit integer field that holds the last executionContextId (and the initial RNG value) gets converted into a small integer in V8 before being put into an inspector protocol message. A SMI is not large enough to span the full 32-bit range (varies by CPU architecture, IIRC, but is typically 30-31 bits). This showed up as a problem when testing with a dbg build since there is a debug check that the value being used to initialize a SMI fits inside the available range. The failure is non-deterministic and we don't have a test that would catch it. It is fixed in Fix initial executionContextId rangeΒ #1217. I didn't realize until this morning that this was the problem (it has zero visibility on a release build aside from missing CDP messages).

  2. the 300ms appears to be necessary for chrome devtools to render the initial CDP messages between devtools and workerd/v8. It appears the messages don't get rendered if they occur before the creation of the devtools window. We can potentially revert this, but being able to see those messages also saves time when looking into issues. The alternative is having to pull out tcpdump and/or wireshark to see the messages on the wire, or building workerd with custom logging of inspector messages. If the delay is too annoying for wrangler, let's just revert.

ohodson added a commit that referenced this issue Sep 22, 2023
This reverts commit bdb7eb7.

This turned out not to be a factor in the investigation into
experience in devtools.

Bug: #1201
ohodson added a commit that referenced this issue Sep 22, 2023
This reverts commit bdb7eb7.

This turned out not to be a factor in the investigation into
the missing messages affecting devtools (#1201).

Bug: #1201
ohodson added a commit that referenced this issue Sep 22, 2023
This reverts commit bdb7eb7.

This turned out not to be a factor in the investigation into
the missing messages affecting devtools (#1201).

Bug: #1201
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants