Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams pain points #89

Closed
chrisdickinson opened this issue Dec 5, 2014 · 66 comments
Closed

Streams pain points #89

chrisdickinson opened this issue Dec 5, 2014 · 66 comments

Comments

@chrisdickinson
Copy link
Contributor

This is a place to enumerate pain points / problems with streams as they exist presently. Please focus on problems, clarifying comments and questions, or statements of commiseration for specific problems. Proposed solutions should be posted as separate issues that link back to this issue. Proposed solutions should have enough example / backing code to have an informed discussion about the tradeoffs, but are not expected to be production ready / fully working.

If the pain point is "rip out streams from core" -- that can be discussed over here. :)

I'll start:

  • Every stream cares about byte-level buffering. This necessitates an "objectMode" flag, which when omitted, causes all sorts of gotchas for stream users. I am of the opinion that the this mechanism for determining when and when not to buffer causes more problems than its worth and should be replaced. Streams' awareness of their contents reduces their utility.
  • Resource-backed streams (every stream in Node core) have to implement their own semantics for destroy, close, etc; and they have to ensure that their sequence of events matches all others over time.
  • null is a reserved value in streams -- streams cannot meaningfully pass this value along without bugs. This causes problems for folks migrating from earlier versions of streams, and in situations where null is "in-alphabet."
  • The ergonomics of the streams API lead to the prevalence of packages like through and through2 (which by itself is fine), but it would be nice if the APIs were directly usable "out of the box", so to speak.
  • There's no spec for streams, so it's hard to build alternatives that are guaranteed to work.
@tracker1
Copy link

tracker1 commented Dec 5, 2014

1 - bringing in the functionality of event-stream into core streams would be really nice (which includes through2 and the like).

2 - being able to easily create a generic stream that you can write to, that passes through would also be nice.

3 - a first class error method (Stream.prototype.error) method would be helpful for binding/passing, vs stream.emit.bind(stream, 'error')

4 - having methods bound to the context internally would be a bonus as well... stream.write.bind(stream) vs simply passing stream.write;


The following psuedo-example illustrates what I mean here, and something that would be closer to what I'd like to see.

mylib.foo = function getDataRowStream(query, params) {
  var stream = Stream.create(true /*object mode*/);  // 2 above, simple create of through stream
  queryDataAndPipeToStream(query, params, stream);
  return stream;
}

function queryDataAndPipeToStream(query, params, stream) {
  getDatabaseConnection()
    .then(function(conn){
      var req = new dal.Request(conn);
      req.stream = true;
      req.on('row', stream.write /*methods bound to stream instance*/)
      req.on('error', stream.error /*emits 'error' event*/)
      req.on('complete', stream.end);
      req.query(query, params)
    })
    .catch(stream.error);
}

@chrisdickinson
Copy link
Contributor Author

(EDIT: this was originally quoting the comment above, which was since edited)

I think the disconnect for me, is going from something like creating a connection to a database (where a promise is a great fit) to something where a stream makes sense (returning rows of data) is a bit weird.

I might be misinterpreting this quote, so forgive me if I'm just blathering here:

Promises are great for representing single async operations that happen exactly once, pass or fail -- they are a chainable primitive for those sorts of operations. Node's primitive for that class of operation is the Node-style callback -- which is, notably, not inherently chainable. Streams (collectively including readables, writables, duplexes, and transforms) represent a series of zero or more data events over time, that may eventually end or fail.

@chrisdickinson
Copy link
Contributor Author

Some more pain points:

  1. There are two .pipe functions -- Stream.prototype.pipe and ReadableStream.prototype.pipe. This is confusing for end-users and causes bugs.
  2. Streams are based on event emitters, but listening to certain events (readable and data) has implicit side effects.
  3. Streams documentation is currently a mix of reference, implementation guide, usage guide, and advanced usage -- it's hard to follow and link into from other docs, or consume as a single document.

@aredridel
Copy link
Contributor

Those last bits are SO TRUE to my experience.

@tracker1
Copy link

tracker1 commented Dec 5, 2014

@chrisdickinson I removed the part you quoted... I was merely expressing a thought in my head that it would be nice if there was a cleaner way to chain against a Promise that resolved to a Stream. It's not spec, but if a Promise was also an EventEmitter that would emit against the resolved item, it could double as a ReadableStream...


Another minor pain point is that Writable streams don't emit end they emit finish ... which means when my final chain goes from a through stream to a writeable io stream, my end/finish event handler needs to change. It would be nice if writable streams emitted end as well as finish after the final flushed output.

@jonathanong
Copy link
Contributor

a lot of my grievances are already issues in node: https://github.com/joyent/node/issues/created_by/jonathanong

a major issue for web framework maintainers are leaks: https://github.com/jshttp/on-finished#example. requiring us to use https://github.com/stream-utils/destroy and https://github.com/jshttp/on-finished is a symptom of a broken stream implementation, imo.

but personally, i'd rather have readable-stream be separated from core, have all the stream development progress be moved into the WHATWG Stream specification, and allow users to use that as the stream implementation (assuming it fixes these bugs).

@mjackson
Copy link
Contributor

mjackson commented Dec 8, 2014

There's no spec for streams

This is definitely the main pain point for me. That's it. If there were a formal streams spec that everyone could run their own little implementations on to test for conformance that would fix a LOT of issues.

@chrisdickinson I'm sure you've seen https://github.com/promises-aplus/promises-tests. I'm tempted to do something like that for streams. Think it would help us all get on the same page?

A first pass could just spec out the existing implementation in node, trying to catch as many corner cases as possible. Then, anyone who wanted could make a competing streams implementation, similar to how when.js, rsvp, bluebird, etc. compete on speed and features.

@chrisdickinson
Copy link
Contributor Author

re: @mjackson:

This is definitely the main pain point for me. That's it. If there were a formal streams spec that everyone could run their own little implementations on to test for conformance that would fix a LOT of issues.

The other side of this is: if we spec streams as they exist currently, the difficulty of changing/improving streams goes up. Are we comfortable with how streams currently work, and are we willing to support that for the long haul? I agree that the eventual goal of streams should be a spec, but on the other hand, I feel like committing to what we have now may be premature.

EDIT: That said, I'd totally welcome a stream spec suite so we can see clearly what behaviors we're dealing with!


re: @jonathanong:

but personally, i'd rather have readable-stream be separated from core, have all the stream development progress be moved into the WHATWG Stream specification, and allow users to use that as the stream implementation (assuming it fixes these bugs).

The first part of that is under debate over here; I've made a big list of concerns with that approach as well. Re: the second half, the problem with using the WHATWG spec is that it's built around a core primitive that Node doesn't use -- namely, Promises.

Going through your linked issues, it seems like the lion's share of problems have to do with resource or handle-backed streams and how they interact (or, rather, don't interact as expected) with the pipe mechanism -- I totally agree that that behavior needs to be shored up; maybe as something that builds on top of vanilla streams, rather than baked into the base Readable and Writable classes.

@jmar777
Copy link
Contributor

jmar777 commented Dec 8, 2014

Copying these over from nodejs/roadmap#1:

  1. The lack of error-forwarding via pipe() removes much of its elegance. The docs like to make things look simple, as in this example:
var r = fs.createReadStream('file.txt');
var z = zlib.createGzip();
var w = fs.createWriteStream('file.txt.gz');
r.pipe(z).pipe(w);

Beautiful, except you can't just add a single error handler on the end... you need one for each stream. And you could actually get errors from all three of them, so you can't just forward them to a callback or anything sane like that... it just gets gnarly way too fast.

  1. This is much broader, but the whole readable/writable dichotomy is confusing in a way that spiders out into virtually all stream implementations too. I think there are enough examples out there of "read only" stream APIs that make a compelling enough case for that (I even think @creationix experimented with wrapping node streams like that at one point).

  2. Object mode. I sort of get why they ended up in there, but if there's a specific point in time where streams jumped the shark...

I don't think any of those are actionable so long as node compatibility is a requirement, but I guess that's a different topic.

@gaearon
Copy link

gaearon commented Dec 8, 2014

Coming from Rx background, I was unpleasantly surprised by how unintuitive piping is. Most examples of downloading a file (arguably the simplest use case for streams and piping) that I found it blogs and on StackOverflow would get error handling, or flushing, or finish/close/disposal sequence wrong. Seriously. (Oh noes.)

IMO Node's pipe is somewhat like jQuery's “promise”-mutating then in this respect: broken by design due to absence of ownership and reader/writer separation in the API. In contrast, Rx Observables separate consuming from creating, like real Promises do, so there is no room for the whole class of mistakes.

Rx also has first-class disposables, as in you have to return a “disposable” from Observable constructor, and there are several built-in utility disposables for simpler composition.

I can be wrong though! Pardon me if I'm missing something obvious about how streams should be implemented.

@sonewman
Copy link
Contributor

sonewman commented Dec 9, 2014

I think the problem with error forwarding is that if you had a long stream of piping object, it would be hard to determine where the error actually resides and deal with it in an appropriate manor. As far as i know streams, (though similar in nature to a promise will unlimited resolves of data , before a final end resolve - to squish into a promise paradigm) the idea is more about each individual stream taking ownership of its responsibilities, that actually gives the implementer much more control, albeit extra code and edge-cases which they themselves are required to take responsibility of dealing with.

@sonewman
Copy link
Contributor

sonewman commented Dec 9, 2014

Also i get the point about the end event on the readable and the finish event on a writable, but what about transform streams? The finish event gets called when .end is called on the writable as it should, even if a stream consuming the readable end has not finished reading it. Only once this is done will it emit the end event. I think it is important to differentiate between the two events even though it might see a little confusing semantically.

@joepie91
Copy link
Contributor

joepie91 commented Dec 9, 2014

From an application developer point of view, these are my main concerns with streams2:

  1. There is a pipe event on Writable streams, but not on Readable streams. It can be useful to be notified when something starts being piped elsewhere, and it's not obvious how to be made aware of this. Overriding _read or pipe methods is a hack at best, especially with third-party streams.
  2. Apparently it's only possible to set objectMode true for either both input and output, or neither. To set either of the two to object mode, but not the other, you need a custom constructor. This should really be easier to do.

Aside from those two points, I haven't really found any issues with streams so far, from my point of view. I'm actually finding them to be rather practical to work with.

@sonewman
Copy link
Contributor

sonewman commented Dec 9, 2014

Having a pipe event on a duplex/ transform/ readable & writable stream, would be confusing. How would you know if it was piping or being piped if you had the same event?

Overriding the pipe method is not too difficult, if you absolutely needed that:

var pipe = Readable.prototype.pipe
Readable.prototype.pipe = function (dest, options) {
  this.desintations.push(dest)

  dest.on('finish', function () {
    var i = this.destinations.indexOf(dest)
    ~i && this.desintations.splice(i, 1)
  }.bind(this)

  return pipe.call(this, dest, options)
}

With regards to question 2, if you needed to pipe a transform object stream into a buffer there is no reason you can't do this in your transform:

var Transform = require('stream').Transform
var inherits = require('util').inherits

function ObjToBufferStream() {
  Transform.call(this, { objectMode: true })
}
inherits(ObjToBufferStream, Transform)

ObjToBufferStream.prototype._transform = function (data, enc, next) {
  try {
     next(null, new Buffer(JSON.stringify(data)))
  } catch (err) {
     next(err)
  }
}

var objToBufferStream = new ObjToBufferStream()
objToBufferStream.pipe(process.stdout)

objToBufferStream.end({ obj: true })

@sonewman
Copy link
Contributor

@chrisdickinson I am curious, after hearing the discussion in the TC about potentially moving streams development out of core and introducing it as a bundled dependency.

What sort of API would you give streams, if you could start from scratch? Would you go for a promised backed approach such as @domenic's https://github.com/whatwg/streams approach?

@aredridel
Copy link
Contributor

You can in fact set objectMode for readable vs writable, but it's ugly: this._readableState.objectMode = true

@sonewman
Copy link
Contributor

@aredridel this is true, albeit the opposite to my example.

But this would only really matter for a duplex stream, in any case you are still going to need to implement the internals _read() method and use of push() to accommodate for the format of your data.

If you were implementing a transform stream, and it was just set to objectMode, that doesn't mean you can't write buffers to it. I think determining objectMode for one side or another is a non-issue.

No matter what is implemented, the hardest part is the conversion between formats (e.g. buffers to objects), as data across the wire is not always deterministic and could very well run over the boundaries of a chunk.

This is an issue no matter what encoding you have, unless you are using only JavaScript objects directly (and then you can't really manage memory effectively).

My example is contrived but the only important thing is the part:

next(null, JSON.stringify(data))

all the rest is purely contextual.

Personally when I see properties like _readableState, _writableState, _transformSate I consider it as implementation detail and not API, unless documented otherwise (such as in the case of _read, _write, _transform methods for the streams inheritance API).

As far as I am concerned, technically implementation detail is free to change without breaking semver!

@bodokaiser
Copy link

While implementing a WebSocket parser using the stream.Transform I always wished to have some feature where you could reliable push data back to read it the next time. This comes in handy when you need four bytes in row to extract data and you don't need to manage the buffer yourself. I think #unshift was once intended to do this but it never worked for me.

Beside I would like to have streams working much more as expected.

@sonewman
Copy link
Contributor

@bodokaiser This is an interesting idea... so you are describing a way of putting some data back onto the write buffer and applying back pressure. Unshift would not act in that way, because it is on the Readable side so it would only put it on the start of the readableState buffer.

This is definitely food for though!

@bodokaiser
Copy link

@sonewman yes, indeed. I always wondered why this method was missing as streams should be chunk-able

@sonewman
Copy link
Contributor

It think this would be tricky to implement with what we have currently, for the purposes of this explanation I will call this method this.pushBack (this being the stream.Writable of course).

We currently create a new WriteReq when adding chunks to the _writableState buffer and this contains the encoding and the writecb supplied to .write(data, encoding, function () { /* this callback */ }) the question is, if a we were to call this.pushBack(chunk) would the callback fire? because technically some of the data would have been handled.

This could lead to confusion (and inconsistent behaviour) if it was decided that callback should be called multiple times, or alternatively if it was only called on the first handle and was not for the subsequent chunk.

The other problem is that when chunk is handled, it transfers the encoding and callback on to the _writableState before the _write method is called. When we call next in _write or _transform it eventually calls state.onwrite, which in turn calls the callback associated with that chunk either on thenextTick or immediately (if the writable end is being uncorked and the buffer cleared or it's being written to and there is anything previously backed up in the buffer).

(stream.Transform calls this via callback inception, in case following the source causes mind bending! 😄)

The above scenario would mean that this.pushBack() would need to be called synchronously within the _write or _writev method. Otherwise the encoding and callback could be lost when the handling of the next chunk happens synchronously or before the point in the nextTick that the callback was added (which would likely occure if we used process.nextTick in _write or _writev and then call next) this could cause some very strange behaviour.

I think we would need to decide if this functionality was definitely something that we really want to add to stream.Writable as it would most likely depend on some larger internal restructuring. In addition we would need to watch closely to ensure that engineering this does not affect the streams performance.

@ORESoftware
Copy link
Contributor

ORESoftware commented Jan 15, 2017

@aredridel I saw your comment about callbacks in readables - I have always been confused why the read method in readables does not use a callback whereas the write method on writable has a callback. just for kicks I started writing an alternative stream API, and it's based on having the read method have a callback like you mention - assuming that's what you were getting at. Any reason why read doesn't take a callback in the current Node streams API? It makes no sense to me.

@calvinmetcalf
Copy link
Contributor

@ORESoftware noms might be what your looking for.

@ORESoftware
Copy link
Contributor

ORESoftware commented Jan 16, 2017

@calvinmetcalf thanks will check it out; after much research, I think the standard API will work, it's just not very intuitive or consistent, you can do this:

function getReadableStream(fn) {
    return new stream.Readable({
        objectMode: true,
        read: function (n) {
            fn(n, (err, chunk) => {
                 // this.push is only called after a callback has fired
                this.push(chunk);
            })
        }
    });
}

but I wish they had designed the API like folllowing instead,
it would have been so much more intuitive:

function getReadableStream(fn) {
    return new stream.Readable({
        objectMode: true,
        read:  (n,cb) => {  // <<<< read should have taken a callback., but it don't! mofos
            fn(n, (err, chunk) => {
                  if(!err){    // whatever, you get the idea
                     cb(null, chunk);  
                  });
             });
        }
    });
}

after many many weeks of subpar understanding, I think I get it, but you
never know when another curveball will get thrown at you in Node land.
IMO streams should always have been some userland thing...

However, core could add a callback to readable.read(), so maybe one day
that will happen without breaking backwards compatibility...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests