Making C++ streams API public #921

indutny · 2015-02-22T20:20:14Z

As of b968623 we finally have something that might be useful for external bindings. I'd like to eventually make this API public.

What are your thoughts on this? What should be improved/changed in order to make it more useful?

cc @iojs/collaborators @iojs/community-members @iojs/streams @iojs/tc

The text was updated successfully, but these errors were encountered:

mscdex · 2015-02-22T21:50:33Z

Does this allow addon authors to create streams3 streams from C++? If so, what does the js-land part look like for hooking up to the generic C++ stream binding?

indutny · 2015-02-22T21:58:07Z

@mscdex not yet, but a good suggestion. Current API is about doing efficient wrapping of the underlying socket to implement some protocol: tls, (maybe) http2, spdy, mysql...

piscisaureus · 2015-02-25T17:21:10Z

Conceptually +1 to having a public c++ streams base class.
However I am not in favor of making the alloc/read/afterWrite events part of the public API.

indutny · 2015-02-25T17:40:20Z

@piscisaureus why not? alloc/read is the whole point of this thing, because it allows efficient reading straight into the prepared memory. In fact, this is the reason why TLSSocket is faster than legacy tls in v0.10 node.js . AfterWrite could be shaved off, I guess.

indutny · 2015-02-25T17:45:49Z

@piscisaureus here is a first pass on making it suitable as a user API: #957 . I'm not yet sure how to kill AfterWrite.

sgerace · 2015-09-29T17:10:28Z

@indutny We've been attempting to optimize a very performance-critical portion of our code base related to streams (specifically, stream transformations) and the idea of moving our implementation to C++ has been tossed around. Do you think your latest additions puts the code close to the point where a C++ streams3 implementation is possible? If you could provide a little guidance on what might be left to be done on the Node side, I'm sure we can spare some time to help out move the ball forward.

indutny · 2015-09-29T17:21:42Z

@sgerace this StreamBase C++ API saves lots of time that is usually spent on allocating intermediate data, and copying values around. This proved to be very useful for TLS and HTTP, but I can't say for sure that every use case will benefit from it.

Right now there are several blockers to exposing it:

Usage of ReqWrap. This is an internal class, that is responsible for wrapping the libuv request handles and invoking asynchronous callbacks on them. It should be an abstract thing to make it public
Heavy usage of internal Environment class. It should not use it in any of headers files. However it could be using it in stream_base.cc, as it is compiled with core

Another important thing is user feedback. I have used this API in two places in core so far, and am pretty much happy about how it works. But, perhaps, your use-case requires more flexibility from it. Let me know if you have any ideas for improvements!

Thanks!

sgerace · 2015-09-29T17:42:16Z

@indutny Perhaps if I provide a little bit more information about our exact use case, you might be able to guide us down the correct path. Basically, we are trying to validate (both against the JSON spec and our own schema definitions), hash (in a consistent way) and gzip very large JSON objects in a streaming manner. This whole thing actually came about because our initial implementation simply read the data, called a JSON.parse(), then used that to produce a stable string (i.e., sorted object properties, etc.). Unfortunately, some of the JSON objects that we are working with are close to 1GB in size, and so we were, at times, hitting the memory limits of Node.js and our server was crashing. We reimplemented everything as a Transform stream (so basically we had a pipeline similar to input.pipe(validator_and_hasher).pipe(gzip).pipe(out) where validator_and_hasher is our transform stream. That solved our memory issues, but now we are running into performance issues because we are having to go character-by-character to validate and hash the stream of JSON in a single pass. This is very CPU intensive at the moment and takes way longer than we'd like (processing a ~600MB JSON file can take 3-4 minutes). My gut feeling is that it is a combination of the allocation and access/manipulation of buffers on the Javascript side that is preventing us from achieving the desired performance, but I'm not 100% certain.

indutny · 2015-09-29T17:47:26Z

@sgerace do this JSON objects come over TCP socket?

sgerace · 2015-09-29T17:50:15Z

@indutny Yeah, the input streams come from both HTTP requests and direct TCP sockets and the output is ultimately piped to an external (permanent) data store (such as hard disk or S3).

indutny · 2015-09-29T18:01:54Z

@sgerace this is very interesting use case. I think it could probably be solved by using two StreamBase instances (one for receiving data, and one for sending data to another socket), and connecting them in C++.

The idea of StreamBase is to eliminate networking <-> JS code interaction, and do processing in C++. However please be aware that the networking API is very low-level there, so it may be more complicated to implement it there.

sgerace · 2015-09-29T18:09:00Z

@indutny So, let me see if I understand what you are saying; basically, we would create an add-on function (in C++) that accepts, as two of its arguments, two StreamBase instances (which would obviously be provided as Streams from JS) and then somehow read the data from the first stream, process it, and write the result to the second stream which would then find its way back to JS-land. Am I understanding that correctly?

indutny · 2015-09-29T18:13:46Z

@sgerace not exactly

You'll need to provide your own C++ instances which should be inherited from StreamBase
Wrap recv and send socket with these instances
Handle the incoming data in C++ recv instance
Process it
Pass it to C++ send instance
StreamBase will send it straight to the socket

This way - JS maybe skipped altogether.

If you want to leave sending data to JS, you may use just a single StreamBase instance to process incoming data and emit processed data like it has been received on that socket.

sgerace · 2015-09-29T18:40:35Z

@indutny Okay, so I'm starting to get a better picture of this, but now I've got a few more questions.

Before we get any further, is what you are describing even possible with the current implementation of StreamBase, or, would the blocking elements you pointed out earlier need to be addressed prior to that being possible? (As an aside, I was just playing around with just including stream_base.h into my add-on and it doesn't seem to be happy, I'm guessing thats because it isn't part of the public interface yet?)
How does this compare to something more like the way (I think) zlib is implemented, where the streams are handled at the JS level and then data is written and processed (asynchronously?) as packets on the C++ side...do you think there could be significant improvements over something like that?

indutny · 2015-09-29T23:29:35Z

It is internal thing until the blockers will be resolved
It skips allocation of Buffer, and handling it to JS. Both take some time. The improvement amount depends on the use case, but I would say - most likely yes. I'm just not certain how much the improvement will be. Just to remind about it one more time, the price for this is significantly increased code complexity.

sgerace · 2015-10-06T21:21:19Z

@indutny Thanks for the summary, I think after looking over everything (and spending a few more days playing around with the code), we've decided to go with a more traditional zlib-type implementation for now. To your second point, we just don't know the real-world performance gains we would be able to obtain at the cost of implementing a significantly more complex system. We can always move the logic into a C++ stream implementation if the performance is still an issue once the interface is ironed out. In the meantime, however, I would certainly be interested to offer feedback to any proposed C++ Stream API.

indutny · 2015-10-06T23:00:36Z

Thanks for collaboration!

Fishrock123 · 2016-01-07T18:32:36Z

See: nodejs/node-eps#2

bnoordhuis · 2016-02-12T21:23:44Z

Close on the assumption that nodejs/node-eps#2 supersedes this issue. There's not been much movement here in the last four months either.

indutny added c++ Issues and PRs that require attention from people who are familiar with C++. future labels Feb 22, 2015

indutny mentioned this issue Mar 4, 2015

HTTP2 (and maybe WebSockets) nodejs/NG#8

Open

brendanashworth added the discuss Issues opened for discussions and feedbacks. label Jun 24, 2015

bnoordhuis closed this as completed Feb 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making C++ streams API public #921

Making C++ streams API public #921

indutny commented Feb 22, 2015

mscdex commented Feb 22, 2015

indutny commented Feb 22, 2015

piscisaureus commented Feb 25, 2015

indutny commented Feb 25, 2015

indutny commented Feb 25, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Oct 6, 2015

indutny commented Oct 6, 2015

Fishrock123 commented Jan 7, 2016

bnoordhuis commented Feb 12, 2016

Making C++ streams API public #921

Making C++ streams API public #921

Comments

indutny commented Feb 22, 2015

mscdex commented Feb 22, 2015

indutny commented Feb 22, 2015

piscisaureus commented Feb 25, 2015

indutny commented Feb 25, 2015

indutny commented Feb 25, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Sep 29, 2015

indutny commented Sep 29, 2015

sgerace commented Oct 6, 2015

indutny commented Oct 6, 2015

Fishrock123 commented Jan 7, 2016

bnoordhuis commented Feb 12, 2016