Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve new message-rpc API #11159

Closed
tortmayr opened this issue May 16, 2022 · 7 comments · Fixed by #11447
Closed

Improve new message-rpc API #11159

tortmayr opened this issue May 16, 2022 · 7 comments · Fixed by #11447
Labels
messaging issues related to messaging

Comments

@tortmayr
Copy link
Contributor

tortmayr commented May 16, 2022

With #11011 a new message rpc protocl will be integrated into Theia's core messaging API.
During the review a couple of issues and potential improvments where identified.

  1. Channels are directly receiving and sending binary buffers. This rather low-level API is not convenient for adopters to interact with.
    Ideally we can make the binary encoding/decoding an implementation detail of the channel implementation and the Channel itself only sends/receives messages. It has to be investigated weather this is feasible especially with the Plugin RPC API in mind. A solution is required that still allows to directly forwards received messages in the hosted-plugin-process and plugin-host process without having to decode the entire messages. (See also Integrate new message-rpc prototype into core messaging API (extensions) #11011 (comment))
  2. Currently a custom binary message codec is in place. Maybe we can offload the encoding/decoding to a specialized library (e.g. msgpack. It has to be evaluated how performant these third party libraries are compared to our custom solution. (see also Integrate new message-rpc prototype into core messaging API (extensions) #11011 (comment))

We (EclipseSource) plan to contribute this on behalf of STMicroelectronics

@vince-fugnitto vince-fugnitto added the messaging issues related to messaging label May 16, 2022
tortmayr added a commit to eclipsesource/theia that referenced this issue Jun 22, 2022
- Refactor Channel API to enable sending of any Javscript object over channels instead of having to receive a writeBuffer and write encoding the object into a binary format. This means we essentially move the binary encoding a level 
- Combine old  message encoder/decoders into a common BinaryMessageCodec interface.
- Adapt dependent components and services to new API 

Part of eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Jun 22, 2022
- Refactor Channel API to enable sending of any Javscript object over channels instead of having to receive a writeBuffer and write encoding the object into a binary format. This means we essentially move the binary encoding a level 
- Combine old  message encoder/decoders into a common BinaryMessageCodec interface.
- Adapt dependent components and services to new API 

Part of eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Jun 22, 2022
- Refactor Channel API to enable sending of any Javscript object over channels instead of having to receive a writeBuffer and write encoding the object into a binary format. This means we essentially move the binary encoding a level 
- Combine old  message encoder/decoders into a common BinaryMessageCodec interface.
- Adapt dependent components and services to new API 

Part of eclipse-theia#11159

Contributed on behalf of STMicroelectronics
@tortmayr
Copy link
Contributor Author

tortmayr commented Jun 23, 2022

Point 1 (object-based-channel) is addressed with this PR: #11328

@tortmayr
Copy link
Contributor Author

tortmayr commented Jun 23, 2022

For Point 2 I also did a little evaluation/benchmarking of different msgpack implementations:

Introduction

Currently we are using a custom binary codec for encoding RPC messages that are sent between the Theia frontend and backend processes.Compared to plain JSON-RPC this codec offers efficient encoding of binary data (buffers). However, it turns out the performance for large (buffer-free) objects is considerably worse with the new message codec.
Msgpack is a standard protocol for encoding objects into a binary format and should be more performant than our custom codec. In addition, building ontop of a well established standard probably makes more sense than trying to re-implement the wheel.
In order to switch to a msgpack library the following requirements need to be fulfilled:

  • The msgpack implementation needs be quicker (or at least en par with) than the current custom binary codec
  • The msgpack implementation needs to support an extension mechanism that allows us to register custom value encoders/decoders
  • The project should be active and we need to trust the maintainer
  • Compatible OS License

Nice to have features:

  • Support for encoding circular structures.
  • Support for streams

Potential Candidates ( as listed on the offical Msgpack website:

Evaluation:

@msgpack/msgpack ✔️

Weekly NPM Downloads: 52992
Github Stars: 962
License: ISC
Contributors: 15 (one main contributor)

More or less the offical mspack implementation for JS.
Full support for custom extensions.
Does not support circular structures
Has streaming support
Active (recent releases and commits)
Active community

msgpack-lite ❌

Weekly NPM Downloads: 583
Github stars: 926
License: MIT
Contributors: 6 (one main comitter)

Seems to be no longer activley maintained. Last release was in 2017.
Doesn't provide any additional features that would justify using this over the offical msgpack implementation (@msgpack/msgpack)

msgpackr ✔️

Weekly NPM Downloads: 547747
Github stars: 199
License: MIT
Contributors: 14 (one main comitter)

Claims to be faster than the official msgpack js implementation.
Full support for custom extensions.
The main advantage over offical msgpack implementation is the support for circular structures.
(Enabling this feature adds 15-20% performance overhead for encoding)
Has streaming support
Active (recent releases and commits)
Active community

@ygoe/msgpack.js ❌

Github stars: 119
Weekly NPM Downloads: 583
License: MIT
Contributors: 3 (one main comitter)

Small project that only seems to implement the base msgpack specifiation.
No support for other extension types

what-the-pack ❌

Deprecated and no longer active.
Conclusion: Does not fit our requirements

Conclusion

If we switch to msgpack we should either include @msgpack/msgpack or msgpackr

Benchmarks

Codecs:

  • BinaryMessageCodec (currently integrated into Theia master)
  • MsgPack
  • MsgPackR
  • MsgPackRCircular

BaselineCodec: BinaryMessageCodec

TestData: 4

BenchmarkRuns: 5

Testobject

  1. SimpleRPC (a simple RPCMessage object representing a log request)
  2. StringArray10000 (flat array with 1000 string entries)
  3. BinaryBuffer8MB (A 8MB Uint8Array)
  4. DeployedPluginsArgs (Large plain JS object (8MB stringified))
  5. CircularParentChild (An object with a parent-child circular reference)

Benchmark for SimpleRPC

MessageCodec Encode Success Encoding Time (ET) ET-Baseline Encoded Size (ES) ES-Baseline Decode Success Decode Time (DT) DT-Baseline Match (Input-Decoded)
BinaryMessageCodec true 0.09 ms 0.00 ms (Factor: 1.00) 82 Bytes 0 Bytes (Factors: 1.00) true 0.14 ms 0.00 ms (Factor: 1.00) false
MsgPack true 0.07 ms -0.02 ms (Factor: 0.78) 47 Bytes -35 Bytes (Factors: 0.57) true 0.09 ms -0.05 ms (Factor: 0.64) true
MsgPackR true 0.04 ms -0.05 ms (Factor: 0.44) 53 Bytes -29 Bytes (Factors: 0.65) true 0.04 ms -0.10 ms (Factor: 0.29) true
MsgPackRCircular true 0.05 ms -0.04 ms (Factor: 0.56) 53 Bytes -29 Bytes (Factors: 0.65) true 0.04 ms -0.10 ms (Factor: 0.29) true

Benchmark for StringArray10000

MessageCodec Encode Success Encoding Time (ET) ET-Baseline Encoded Size (ES) ES-Baseline Decode Success Decode Time (DT) DT-Baseline Match (Input-Decoded)
BinaryMessageCodec true 5.91 ms 0.00 ms (Factor: 1.00) 88893 Bytes 0 Bytes (Factors: 1.00) true 16.21 ms 0.00 ms (Factor: 1.00) true
MsgPack true 0.67 ms -5.24 ms (Factor: 0.11) 48893 Bytes -40000 Bytes (Factors: 0.55) true 1.08 ms -15.13 ms (Factor: 0.07) true
MsgPackR true 0.56 ms -5.35 ms (Factor: 0.09) 48893 Bytes -40000 Bytes (Factors: 0.55) true 0.94 ms -15.27 ms (Factor: 0.06) true
MsgPackRCircular true 0.68 ms -5.23 ms (Factor: 0.12) 48893 Bytes -40000 Bytes (Factors: 0.55) true 0.48 ms -15.73 ms (Factor: 0.03) true

Benchmark for BinaryBuffer8MB

MessageCodec Encode Success Encoding Time (ET) ET-Baseline Encoded Size (ES) ES-Baseline Decode Success Decode Time (DT) DT-Baseline Match (Input-Decoded)
BinaryMessageCodec true 6.33 ms 0.00 ms (Factor: 1.00) 7755890 Bytes 0 Bytes (Factors: 1.00) true 3.93 ms 0.00 ms (Factor: 1.00) false
MsgPack true 2.14 ms -4.19 ms (Factor: 0.34) 7755890 Bytes 0 Bytes (Factors: 1.00) true 0.03 ms -3.90 ms (Factor: 0.01) false
MsgPackR true 2.96 ms -3.37 ms (Factor: 0.47) 7755890 Bytes 0 Bytes (Factors: 1.00) true 0.01 ms -3.92 ms (Factor: 0.00) true
MsgPackRCircular true 4.57 ms -1.76 ms (Factor: 0.72) 7755890 Bytes 0 Bytes (Factors: 1.00) true 0.02 ms -3.91 ms (Factor: 0.01) true

Benchmark for DeployedPluginsArgs

MessageCodec Encode Success Encoding Time (ET) ET-Baseline Encoded Size (ES) ES-Baseline Decode Success Decode Time (DT) DT-Baseline Match (Input-Decoded)
BinaryMessageCodec true 72.00 ms 0.00 ms (Factor: 1.00) 4159142 Bytes 0 Bytes (Factors: 1.00) true 350.89 ms 0.00 ms (Factor: 1.00) false
MsgPack true 25.84 ms -46.16 ms (Factor: 0.36) 3615603 Bytes -543539 Bytes (Factors: 0.87) true 51.10 ms -299.79 ms (Factor: 0.15) true
MsgPackR true 18.52 ms -53.48 ms (Factor: 0.26) 3043098 Bytes -1116044 Bytes (Factors: 0.73) true 20.30 ms -330.59 ms (Factor: 0.06) true
MsgPackRCircular true 26.91 ms -45.09 ms (Factor: 0.37) 3010310 Bytes -1148832 Bytes (Factors: 0.72) true 16.08 ms -334.81 ms (Factor: 0.05) true

Benchmark for CircularParentChild

MessageCodec Encode Success Encoding Time (ET) ET-Baseline Encoded Size (ES) ES-Baseline Decode Success Decode Time (DT) DT-Baseline Match (Input-Decoded)
BinaryMessageCodec false 0.07 ms 0.00 ms (Factor: 1.00) -1 Bytes - false -1.00 ms - false
MsgPack false 0.17 ms 0.10 ms (Factor: 2.43) -1 Bytes - false -1.00 ms - false
MsgPackR false 1.15 ms 1.08 ms (Factor: 16.43) -1 Bytes - false -1.00 ms - false
MsgPackRCircular true 0.03 ms -0.04 ms (Factor: 0.43) 37 Bytes - true 0.02 ms - true

Conclusion

Both msgpack libraries are significantly faster than our custom protocol. Even for small objects or buffers the encoding/decoding duration is at least halved. This becomes especially prominent when looking at the encoding/decoding times for the DeployedPluginArgs object. This performance improvement alone would be reason enough to switch to msgpack. Both libraries can be easily extended with custom codecs and also support streamed object encoding which means we could also get rid of our custom BinaryMessagePipe. If we want to be able to encode circular structures we would have to go with msgpackr. If this is not an required both libraries should be more or less feature equivalent.

(@tsmaeder @paul-marechal FYI)

@paul-marechal
Copy link
Member

paul-marechal commented Jun 23, 2022

I believe the safe route would be to use @msgpack/msgpack as it looks like the main implementation.

If we feel more frisky, then we could give msgpackr a try as it seems to be the fastest in most cases from your benchmarks.

I also don't think it's worth the hassle of supporting circular references... I'd rather we create clean types to send over the wire, the idea being that as good as the serializer is we still need to make sure the data we pass through it makes sense: objects containing only the necessary fields, no circular references.

@paul-marechal
Copy link
Member

paul-marechal commented Jun 23, 2022

Looking at msgpackr it looks like the author also made a JS implementation for the CBOR format. It muddies the water a bit more, but I feel like it's worth mentioning. From the author's own benchmarks cbor-x is on par with msgpackr.

From our usage point of view we only care about converting our JSON messages into buffers for fast and efficient transport, unless someone has a different opinion I don't think the actual protocol matters for us. We won't extend the protocol to support things outside of the current JSON support.

With that being said, I would be in favor of replacing the Theia custom encoders with msgpackr as it looks like a healthy project and outperforms the alternative implementations. Surprisingly to me, looking at npm downloads msgpackr is more popular than @msgpack/msgpack: 500k vs 50k respectively.

@paul-marechal
Copy link
Member

paul-marechal commented Jun 23, 2022

It has to be investigated weather this is feasible especially with the Plugin RPC API in mind. A solution is required that still allows to directly forwards received messages in the hosted-plugin-process and plugin-host process without having to decode the entire messages.

I have two ideas:

  1. Nested buffers.

    e.g.

    // note: encode() returns a buffer
    
    const envelope = encode({
        routingId,
        message: encode(rpcMessage)
    });

    In our case, buffers are opaque messages that shouldn't be automatically decoded. We need them for compact/efficient data transfer, but also to be in control of when to decode. One example is socket.io: It implements a fast way to transfer buffers compared to other data types. By comparison, if we send an object through Socket.io, it will first JSON.stringify it and then JSON.parse it upon reception. By serializing ourselves we can avoid this cost and even benefit from a transport-specific optimization.

    So the encoding library we would use would need be able to encode buffers as part of messages. This way we can decode the top-layer "envelope", and the nested buffer should be left untouched. Depending on the kind of routing done, we then can forward the nested buffer alone, or just resend the whole message where it needs to be routed.

    This solution relies on the efficiency of decoders to hopefully skip the nested buffer section as fast as possible to minimize the cost of having to decode the "envelope" to access the routing information.

  2. Prefix each message with a fixed-size header for routing.

    This one expects nothing from how the encoders/decoders work. It feels the "dirtiest" to me somehow, but I also really like the simplicity of it: For each message we always expect to read sizeof(routingHeader) followed by the encoded message. It makes it trivial to separate the routing data from the message buffer.

    This solution is in fact an ad-hoc binary protocol implementation/specification, encapsulating subsequent binary data (our msgpack encoded rpc messages). Care should be taken to properly document this, as simple as it would be implemented, to make sure maintenance is as low-cost as it can be.

It would be interesting to see if (1) is viable using msgpackr and compare with (2). I would expect msgpackr to handle this efficiently, but even if (2) turns out to be faster, if the difference is negligible then I'd prefer going with (1).

@tsmaeder
Copy link
Contributor

tsmaeder commented Jun 24, 2022

One of the reasons I introduced read/writeBuffers in my original approach was to prevent unnecessary copying of stuff in scenarios like routing and multiplexing. If I have an API like this:

var t: MyObject= channel.decodeObject();

it forces us to make a copy to access the read data at all. So forwarding a message becomes:

t= incoming.decodeObject();
outgoing.encodeObject(t);

So we copy the object twice: first from the incoming byte array to an object, then from the object to the outgoing by array. The same applies to multiplexing channels. We really should not be engaging any encoding/decoding when we're just forwarding/routing byte buffers.
The second idea was to allow for "streaming" writes to channels: the WriteBuffer. For example, to implement a message router, we can just do read/write the routing info positionally:

writeBuffer.writeString("routingInfo");
writeBuffer.writeBytes(messageBuffer.readBytes());
writeBuffer.commit();

So to me, it seems like sending bytes over the wire is a different thing/API from remotely sending objects.
We probably want some read/writeObject abstraction on a higher level, but that encoding/decoding should only be happening at the two endpoints of an rpc connection, not anywhere in between (routing, multiplexing). It seems like an alluring general abstraction, but I don't think it really covers the lower level problems optimally.

@tsmaeder
Copy link
Contributor

It has to be investigated weather this is feasible especially with the Plugin RPC API in mind. A solution is required that still allows to directly forwards received messages in the hosted-plugin-process and plugin-host process without having to decode the entire messages.

As @paul-marechal suggested, just encoding the message into a byte array and wrapping that in a routing object should not incur much encoding/decoding overhead. We will copy the whole message more times than necessary, but we're already doing that.

In the long run, we could get away from routing over the HostedPluginServer back end service and open a Channel between the front end and each plugin host and just route on the Channel level, thus not incurring that penalty. These "plugin host channels" would of course be multiplexed on the main channel to the back end and routed to the plugin hosts in the back end.

tortmayr added a commit to eclipsesource/theia that referenced this issue Jul 20, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Rename existing "old" implementation into `RecursiveMessageEncoder/Decoder
   - Update message encoder test cases

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Jul 20, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Rename existing "old" implementation into `RecursiveMessageEncoder/Decoder
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Jul 20, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Rename existing "old" implementation into `RecursiveMessageEncoder/Decoder
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Jul 20, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Rename existing "old" implementation into `RecursiveMessageEncoder/Decoder
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Jul 20, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Rename existing "old" implementation into `RecursiveMessageEncoder/Decoder
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Jul 21, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Rename existing "old" implementation into `RecursiveMessageEncoder/Decoder
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Aug 2, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Aug 2, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Aug 2, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
tortmayr added a commit to eclipsesource/theia that referenced this issue Aug 5, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
lucas-koehler pushed a commit to eclipsesource/theia that referenced this issue Aug 16, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes eclipse-theia#11159

Contributed on behalf of STMicroelectronics
JonasHelming pushed a commit that referenced this issue Aug 16, 2022
- Refactor RpcMessageEncoder/RpcMessageDecoder`
   - Extract generic  interfaces
   - Provide default implementation based on msgpackR
   - Update message encoder test cases
- Introduce reusable `AbstractChannel` implementation to reduce boilerplate code for channel setup

Fixes #11159

Contributed on behalf of STMicroelectronics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
messaging issues related to messaging
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants