Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear semantics of write() in WritableStreamInterface #79

Closed
codedokode opened this issue Mar 12, 2017 · 5 comments · Fixed by #92
Closed

Unclear semantics of write() in WritableStreamInterface #79

codedokode opened this issue Mar 12, 2017 · 5 comments · Fixed by #92

Comments

@codedokode
Copy link

I have read the description of WritableStreamInterface here: https://github.com/reactphp/stream#write and it looks very unclear. I don't really understand what is the meaning of return value (true/false) for write() and how I should use it to make sure that all data are written, so I am not writing into closed or broken stream.

Let me quote the docs:

A successful write MUST be confirmed with a boolean true, which means that either the data was written (flushed) immediately or is buffered and scheduled for a future write.
...
If the internal buffer is full after adding $data, then write() SHOULD return false, indicating that the caller should stop sending data until the buffer drains.

This sentences are contradicting: the first one says that write() must return true if the data were saved into a buffer but the second one says that write() can return false in this case. Should I check write()'s return value? Should I consider returning false an error?

Another thing I didn't understand is why write() to a non-writable or closed stream (which is programmer's mistake) is silently ignored instead of throwing an exception. Doesn't this make finding such mistake harder?

Also, I didn't understand how the code is separated between Stream and Buffer classes. They both hold a reference to the underlying stream, they both emit events, they both have writable property... would not it be easier to make it a single class?

Also, the isWritable() method doesn't know anything about the underlying stream:

$loop = Factory::create();
$fd = fopen('./file.txt', 'r');
$stream = new Stream($fd, $loop);
var_dump($stream->isWritable()); // prints true though file is opened only for read

As I understand, isWritable() only checks whether the stream has not been closed so maybe a better name would be isOpened()?

@clue
Copy link
Member

clue commented Mar 12, 2017

Thanks for raising this very interesting issue 👍

The documentation has only recently been added via #72, so I very much appreciate this kind of feedback, which I hope we can use to make this documentation more obvious in the future 👍

A successful write MUST be confirmed with a boolean true, which means that either the data was written (flushed) immediately or is buffered and scheduled for a future write.
...
If the internal buffer is full after adding $data, then write() SHOULD return false, indicating that the caller should stop sending data until the buffer drains.

This sentences are contradicting: the first one says that write() must return true if the data were saved into a buffer but the second one says that write() can return false in this case. Should I check write()'s return value? Should I consider returning false an error?

I agree that this isn't perfectly clear from the docs, but here's how this works for the Stream and its Buffer:

The Buffer has an internal soft-limit (usually 64 KiB) which defines how much data it will accept. Thus, any write() call will simply add data to this buffer and will return true if the stored data is below this soft-limit. Once this limit is reached, it will return false but store this data in its buffer anyway. There's currently no hard-limit, so you can in fact continue pushing data until you run of memory, which is the reason why you SHOULD stop sending to this instance. Flushing this internal buffer is actually handled by the EventLoop, which may be able to flush either the complete buffer in one go or possibly only a small part of it. Once the buffer is below the soft-limit again, it will emit a drain event, which means that you may continue writing to the buffer (which will return true again).

I hope this helps 👍

Another thing I didn't understand is why write() to a non-writable or closed stream (which is programmer's mistake) is silently ignored instead of throwing an exception.

This is related to #37, so I hope you agree we keep this separated?

@codedokode
Copy link
Author

Thank you for the response. It helped me to understand how this function works. Maybe we should change the docs so it would say something like this:

write($data) queues data for future write. All data passed to write() are guaranteed to be written now or later to an underlying stream unless an error occurs (in which case an error event is emitted and the stream is closed).

The return value is a hint: true means the buffer still has a free space for new data and false means that the buffer became too large and writing more data can cause high memory consumption (but that data won't be lost). When the buffer is drained a drain event will be generated. If you don't worry about memory consumption it is safe to ignore the return value.

And there still is a question about writing to a closed stream. Currently it returns false so it can be interpreted as the data were queued but the buffer now is full. Would not it be better to throw an exception in this case?

And after reading README again I see another thing that is not clear:

If a stream cannot handle writing (or flushing) the data, it SHOULD emit an error event and MAY close() the stream if it can not recover from this error.

It is unclear here, if there was an error on the stream but it has not been not closed, are the data lost? Will the stream try to write them again? Will it discard the chunk of data that caused the error and start writing other data? It would be better if there were some guarantees, for example, either all data are successfully written or the stream is closed and an error is reported.

Regarding writing to a read-only stream, I agree, that is a separate issue, I just didn't know it exists.

@clue
Copy link
Member

clue commented Mar 14, 2017

With all this being said, keep in mind that this documentation applies to the generic WritableStreamInterface, not the concrete Stream or Buffer implementation. In fact, there are plenty of implementations of this interface which should all follow consistent semantics, without requiring the concept of a "buffer": http://packanalyst.com/class?q=React\Stream\WritableStreamInterface

Maybe we should change the docs so it would say something like this: […]

I think you're on to something here and I would love to review this more in-depth 👍 May I ask you to file this as a PR so we can look into getting this in? 👍

And there still is a question about writing to a closed stream. Currently it returns false so it can be interpreted as the data were queued but the buffer now is full. Would not it be better to throw an exception in this case?

The problem with an async code flow is that state isn't entirely under user control and relies on external factors, such as the remote side closing the connection, which can basically happen any time. I'm not sure how adding exceptions here would exactly be helpful?

The false return value should essentially be translated as "please stop writing such data currently", which may or may not be a permanent decision. Once isWritable() return false, there's no way it will ever accept data again, so this could be used to check for this.

It is unclear here, if there was an error on the stream but it has not been not closed, are the data lost? Will the stream try to write them again? Will it discard the chunk of data that caused the error and start writing other data? It would be better if there were some guarantees, for example, either all data are successfully written or the stream is closed and an error is reported.

I agree that it's not exactly clear here, so I'm all for improving this text. However, I'm not sure this is something we can guarantee, given the different implementations this interface can have and has. The most common implementations (TCP/IP etc.) will close the stream and discard all data. Any suggestions here?

@codedokode
Copy link
Author

codedokode commented Mar 18, 2017

Yes, I will try to make a PR with changes to README later, to make the documentation more clear, but I still have some questions and more ideas.

Let me explain why I created this issue.

Let's say I want to use streams in a project (actually I already use them). For example, I write a function that asynchronously writes data to a stream passed as an argument. I understand that any I/O operations can fail but want my code to do the best it can. Here is what I want from my code:

  • either write all data to the underlying stream completely and get a notification that the operation has succeeded.
  • or get a notification that an error has occured and, if possible, get the reason of error
  • I would like to receive any of the above notifications only once

It would be nice if the documentation explained how to use the functions so that no data is lost, no event is missed. And maybe provide some guarantees. Because without that it is difficult to write a code that works correctly. I would have to look into the code for every implementation of a stream to check that it works as I assume.

I think it might even be better to add small code snippets showing an example of such code that is 100% correct. I will try to write such example:

function writeSomethingToStream(WritableStreamInterface $stream)
{
    $stream->on('error', function (\Exception $e) {
        // Nothing can be done, so terminate the program
        echo "Operation failed\n";
        throw $e;
    });

    $stream->on('end', function () {
        echo "Operation completed successfully\n";
    });

    // Now we can write data to the stream 
    // Here I write them synchronously for simplicity
    $stream->write('1');
    $stream->write('2');
    $stream->write('3');

    // Close stream 
    $stream->end();
}

I wonder, is this code correct? I think it might fail if we call $stream->close(); somewhere in the middle because I don't listen to the close event. How can I fix this? Should I listen to close instead of end?

And the error event can be emitted many times, but I need to distinguish between fatal and non-fatal errors somehow.

Also I have a question about pipe event. Is it necessary? As I understand piping is just reading data from one stream and writing them to another and maybe passing along error notifications. Why is an special event needed for this case?

I have looked at the code for Util::pipe() method here ( https://github.com/reactphp/stream/blob/master/src/Util.php#L16 ) and it doesn't look simple. And I see that there is no listener for the error event. Is that ok? Will the error reading or writing data be reported? Is it caller's responsibility? The example here ( https://github.com/reactphp/stream/blob/master/examples/cat.php ) also doesn't have any error handling.

(You might notice that I really care much about error reporting. That is because hiding error messages makes the debugging much more difficult so I don't want to miss any).

And I'd like to ask a question about error event again. It is not guaranteed to fire only once and then close the stream. What would several error events on a stream mean? Some data were lost but the stream is working? Or there was an error but the stream will try to transer data again? And how should I handle this event? How do I distinguish between fatal and non-fatal errors?

The most common implementations (TCP/IP etc.) will close the stream and discard all data. Any suggestions here?

The easiest thing to do would be to require that any error on the stream close()s it. Maybe even add a method like closeWithError($e). Or maybe explain what is the meaning of several errors occuring on a stream, how one should handle them (should we abandon the stream or it can recover from errors?), are the data lost in this case. Without explanation it is difficult to understand how to handle this event.

And there still is a question about writing to a closed stream. Currently it returns false so it can be interpreted as the data were queued but the buffer now is full. Would not it be better to throw an exception in this case?

The problem with an async code flow is that state isn't entirely under user control and relies on external factors, such as the remote side closing the connection, which can basically happen any time. I'm not sure how adding exceptions here would exactly be helpful?

I agree that it is complicated. On the one side we don't know when error happens, but on the other side ignoring write()s to a closed stream allow some developer errors to go unnoticed. With synchronous code, fwrite() to a closed file descriptor would give an error.

@clue
Copy link
Member

clue commented Mar 18, 2017

And the error event can be emitted many times, but I need to distinguish between fatal and non-fatal errors somehow.
[…]
And I'd like to ask a question about error event again. It is not guaranteed to fire only once and then close the stream. What would several error events on a stream mean? Some data were lost but the stream is working? Or there was an error but the stream will try to transer data again? And how should I handle this event? How do I distinguish between fatal and non-fatal errors?

The most common implementations (TCP/IP etc.) will close the stream and discard all data. Any suggestions here?

The easiest thing to do would be to require that any error on the stream close()s it. Maybe even add a method like closeWithError($e). Or maybe explain what is the meaning of several errors occuring on a stream, how one should handle them (should we abandon the stream or it can recover from errors?), are the data lost in this case. Without explanation it is difficult to understand how to handle this event.

The documentation has only recently been added via #72, in an attempt to provide a consistent documentation for the existing behavior. I think you're on to something here and support that we may want to look into providing some stricter semantics here. This ticket here is quite lengthy and I don't suppose many will follow through, so may I ask you to file this as an RFC ticket or perhaps even better as a PR? 👍

I think it might even be better to add small code snippets showing an example of such code that is 100% correct.

I agree that adding examples here would help getting people into this new code flow and the expected events more easily. Can you file a (separate) PR that adds relevant examples? 👍

The problem with an async code flow is that state isn't entirely under user control and relies on external factors, such as the remote side closing the connection, which can basically happen any time. I'm not sure how adding exceptions here would exactly be helpful?

I agree that it is complicated. On the one side we don't know when error happens, but on the other side ignoring write()s to a closed stream allow some developer errors to go unnoticed. With synchronous code, fwrite() to a closed file descriptor would give an error.

I'm not sure I understand what you're suggesting what we should do instead. IMHO we should document this as expected behavior and explicitly link to the isWritable() method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants