Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BroadcastChannel postMessage requires synchronous possibly-cross-process access #5327

Open
gterzian opened this issue Mar 3, 2020 · 6 comments

Comments

@gterzian
Copy link
Member

gterzian commented Mar 3, 2020

Branched off of #3691 and in the context of #5305 @domenic @bzbarsky @annevk

I think broadcast channel postMessage is facing some of the same "cross-process" issues, and this one actually seems "easier" than the window equivalent, since it doesn't involve potential navigation(although this is perhaps not true, given the involved origin check?).

In the light of my suggestions in the other issue to "use a parallel queue" to solve this, I will try to do it for BroadcastChannel here.

Please bear with me as this will probably be a very loose initial definition...


First a few definitions:

  • A "global" is a global object
  • Each global has a "is_managing_broadcast_channels" flag, initially false, and an optional "router-id"
  • Each global also has a "broadcast_channels" map of channel name -> << BroadcastChannel object references >>(a queue, see the ordering requirements of Step 9 of postMessage).
  • A "UA broadcast router" is a parallel-queue, this one is unique per UA.
  • The "UA broadcast router" owns a map of origin -> (ref to"Per-global broadcast router", << channel names>>).
  • A "Per-global broadcast router" is also a parallel-queue, each global whose "is_managing_broadcast_channels" is true will have one.
  • Each "Per-global broadcast router" has three fields: a "router-id", a reference to a global, and a reference to the "relevant event-loop" for that global.

When the BroadcastChannel() constructor is invoked, after having created the channel, but before returning it, add the following steps:

Let "global" be the settings object of the new channel.

If global's "is_managing_broadcast_channels" flag is false, run the following "start managing broadcast channels" steps:

  1. Let "new broadcast router" be a new "Per-global broadcast router" whose "router-id" is a unique identifier, whose "global" is global, and whose "event-loop" is the relevant event-loop of global.
  2. Let "origin" be the current origin for global.
  3. Set the "is_managing_broadcast_channels" flag to true.
  4. Set the globals "router-id" to to be "router-id".
  5. Enqueue the following steps to the unique "UA broadcast router" of the UA:
      1. Let "new router" be a Tuple of (ref to "new broadcast router", a new Set containing "channel-name")
      1. Add "new_router" to the map of routers found at "origin".

Now that global is setup to manage channels, run the following steps:

  1. Let "queue" be the result of looking up "channel-name" in the global's map of channels, or creating a new Queue if empty.
  2. Enqueue a ref to the broadcast-channel to "queue".
  3. Return broadcast-channel.

(end of BroadcastChannel() constructor definition)

To run the "the local broadcast steps" means:
- Let "local destinations" be a list of channels that are found in the Set found at "channel-name" in the global's map of channels

  • Run the current Steps 7 to 10 for "local destinations".

The postMessage(message) method, when invoked on a BroadcastChannel object, must run the following steps:

(Run step 1 to 6)

  1. "run the local broadcast steps"
  2. Let "router-id" be the "router-id" of the global, let "channel-name" be the name of "this" channel, let "origin" be the current origin of the global. Enqueue the following steps to the unique "UA broadcast router" of the UA:
    • Let "destination-routers" be list of tuples of ("Per-global broadcast router", Set of channel names) found in the map at the key for "origin".
    • Remove all routers from the list if their set of names doesn't contain "channel-name".
    • For each router in "destination-routers", append the following steps to the router:
      • Queue a task using the DOM manipulation task-source on the "event-loop", to run the following steps in the context of "global":
        • If the "global" of the router is a WorkerGlobalScope object whose closing flag is true return.
        • Run "the local broadcast steps".

The "clean-up the Per-global broadcast router" steps are:
1. Set the global's "is_managing_broadcast_channels" to false.
2. Discard the Per-global broadcast router, as well as any tasks it would have enqueued.
3. Enqueue the following steps to the unique "UA broadcast router" of the UA:
- Remove the Per-global broadcast router from the map found at origin.

Everytime a BroadcastChannel object is GC'ed, remove the channel from the map of the global. If the map is empty, run the "clean-up the Per-global broadcast router" steps.


Looking forward to your thoughts.

@gterzian
Copy link
Member Author

gterzian commented Mar 3, 2020

And by the way I do think this somewhat fits in with what UAs are doing in practice:

For example in the constructor, they all seem to connect the channel to some background component(which can be modelled as a parallel-queue):

FF: https://searchfox.org/mozilla-central/source/dom/broadcastchannel/BroadcastChannel.cpp#325

Chromium: https://github.com/chromium/chromium/blob/ccd149af47315e4c6f2fc45d55be1b271f39062c/third_party/blink/renderer/modules/broadcastchannel/broadcast_channel.cc#L165

You can also see pretty clear how in clearly this infra is setup when the first channel is created in a given context, see https://github.com/chromium/chromium/blob/ccd149af47315e4c6f2fc45d55be1b271f39062c/third_party/blink/renderer/modules/broadcastchannel/broadcast_channel.cc#L27

postMessage also goes through that component:

FF: https://searchfox.org/mozilla-central/source/dom/broadcastchannel/BroadcastChannel.cpp#372

Chromium: https://github.com/chromium/chromium/blob/ccd149af47315e4c6f2fc45d55be1b271f39062c/third_party/blink/renderer/modules/broadcastchannel/broadcast_channel.cc#L87

It's a bit harder to see exactly how message sent are then received, but it seems there is "something in the renderer"(which would match the "Per global router" I tried to spec) and "something in the browser"(which would match the "Per UA router") that are connected and enable messages to be routed around.

When a message is actually received by that "something in the renderer", a task is indeed queued, for example by calling BroadcastChannel::OnMessage in Chromium.

A pretty good higher-level overview for Chromium can be found at https://github.com/chromium/chromium/blob/ea7392901ed885ff4ed64db6dcda7ee23eb62ff5/third_party/blink/public/mojom/broadcastchannel/broadcast_channel.mojom#L14

@annevk
Copy link
Member

annevk commented Mar 3, 2020

I think you're correct that something in that direction would be a more accurate description, but it's also worth considering what we gain from describing it that way as it will make things harder to follow in a way.

@gterzian
Copy link
Member Author

gterzian commented Mar 3, 2020

Yes, it's defenitely complicated, yet it might be worth the cost, for two reasons:

  1. It fits the overall philosophy of the spec of being precise enough not to require UAs to reverse-engineer each other.
  2. If you add things like imperatively allocating event-loops, agent-clusters, and window(proxies), you can define all the cross-agent-cluster type of communications and their effects on event-loops, precisely(like I tried to do by adding steps to the constructor of broadcast channel).
  3. The spec is already very precise when it comes to task running on a given event-loop, and the interaction of parallel steps started from that same event-loop, but not involving other event-loop(rather a component running fetch or something similar). The problem is coordinating multiple event-loops, running potentially in multiple agent-cluster(where each event-loop represents some sort of "in-parallel" steps to the other event-loop). This problem could be addressed by more extensive use of this parallel-queue concept like sketched above.

@gterzian
Copy link
Member Author

gterzian commented Mar 3, 2020

Also this might not actually be worth it for BroadcastChannel, the much more simple change of #5305 is probably more worth it, so this is also meant as a kind of proof of concept with regards to specifying some of the IPC stuff more precisely, as it's "easier" to do this for this channel, vs cross window messaging for example(which would require imperatively allocating a yet-to-be-defined proxy to a cross-origin windowproxy and so on).

@annevk
Copy link
Member

annevk commented Mar 3, 2020

Well, reverse engineering might still be needed here and there, especially if implementations want similar performance profiles. See also the note at the end of https://infra.spec.whatwg.org/#algorithms. It's important for specifications to spell out a model that does not lead to observable differences when implemented, but observable differences due to performance or inherent races are deemed acceptable, if that makes sense.

As we make things more concrete around event loops I do agree we want some language to deal with what we're doing here. Either by saying we're doing something unusual or spelling it out through parallel queues seem reasonable, but once we're closer with the other things the answer might be more obvious.

annevk pushed a commit that referenced this issue Mar 4, 2020
This way nothing happens if something got closed during delivery. (Implementations already did this.)

Tests: web-platform-tests/wpt#21895.

Fixes #1371. Possible follow-up: #5327.
@gterzian
Copy link
Member Author

gterzian commented Apr 11, 2020

I've been looking a bit into the Service Worker spec, and I think what it does with the "job queue" concept and the various algorithms is quite similar to what I've tried to express here(I wasn't aware of these existing concepts before).

For example Register is not an operation running on an event-loop, and the data it manipulates, like the "scope-to-registration-map" do not belong to an realm.

And concepts like reject-job-promise are specified as performing an operation on an event-loop, by way of queuing a task on the event-loop of the job's client.

So I think the SW spec offers a much clearer example of how to potentially solve various issues, like this or #3691 (that still leaves the question as to whether it's actually worth it in terms of complexity for the issues in this spec).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants