-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RemainingBytes utility to streaming decoder #397
Conversation
3bfccb1
to
6569348
Compare
Signed-off-by: Michael Andersen <[email protected]>
6569348
to
80b2495
Compare
Hi @immesys Thanks! I'll take a look this weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this PR! I really appreciate the helpful notes you provided here and also in code comments.
Although the need for RemainingBytes()
makes sense, use of returned io.Reader
from RemainingBytes()
can invalidate Decoder
, which can lead to undefined behavior as you already mentioned in your comments.
Maybe it would be simpler to add Decoder.Buffered()
like encoding/json
.
// Buffered returns a reader of the data remaining in the Decoder's buffer.
// The reader is valid until the next call to Decode.
func (dec *Decoder) Buffered() io.Reader
Adding Decoder.Buffered()
enables users to implement functions like RemainingBytes()
.
Thoughts?
Hmm. I think you are right that we should not make it easy for someone to accidentally shoot themselves in the foot. I think adding Buffered() would allow the caller to solve the problem, but it would be a little more complex on their side. There are probably two solutions we can pick from:
I think I am more in favor of the first option. In the second option, we would expect the caller to make their own MultiReader that prepends the result of Buffered onto the upstream reader, but it feels odd to make the caller use the reader they passed to the decoder again. The decoder still "owns" it and the caller would still get undefined behavior if they call decode again after using the upstream reader. This time it is as a result of their own code, so I agree it is better than the original proposal, but still doesn't feel very clean. How would you feel if I changed the RemainingBytes to "close" the decoder, so that you can't get undefined behavior? If you are ok with it, do you have a preference for what you would want to call that function? I would propose Close() except the signature wouldn't match that of io.Closer, so something like End() or Stop() might be more friendly. |
I'd like to keep this PR open (as option 1) for more consideration after v2.5.0 release. I agree with your reasoning for option 1 (and would like to give it more consideration) even though I prefer option 2 at this time. Each new function needs to be supported, maintained, and backwards compatible. So, more time and long-term consideration is required for alternative APIs that don't match If you'd like to open a new PR to add For v2.5.0 release, I need to first finish reviewing and merge PR #386. Please let me know if this works for you or if I've missed anything. Really appreciate your insights! |
I am back from vacation, and ready to pick this back up. Sorry for the delayed response. I'm happy to implement either option (or another alternative), whichever you feel comfortable with. I understand that picking up functionality is not "free" in that it must be maintained going forward, so I will defer to your decision in which API you think is the best when weighing how easy/safe it is to use, how maintainable it is, and how familiar it might be to users of other encoding libraries. It sounds like implementing Buffered() is your preference. Do you think returning a slice or a reader makes the most sense? Json returns a reader, but I believe our internal buffer is a slice that would be safe to return as long as the caller does not call decode again. I think my preference would be a reader. FWIW I took a look at a few other libraries that I consider "similar" in their IO pattern (xml, gzip, json, vmihailenco/msgpack and tar) and this pattern of returning a reader for the remaining bytes (as opposed to just the buffered bytes) isn't present anywhere else. Thus I think you are probably right to avoid making up a new pattern that users won't find familiar. What some of them do (xml, gzip) is detect if the upstream reader supports io.ByteReader and if it does, they don't read ahead. Msgpack has the same behavior but requires io.ByteScanner. Thus, for users that care about content after the decoded items, they can buffer it themselves, which implements ByteReader and ByteScanner. That way the upstream reader is always safe to use directly for any "remaining bytes" use cases. That sounds like it would impact performance to me, but I haven't tested. Buffered() matches json and msgpack, and as you note, does not preclude a different solution. Perhaps I can implement that, and also do some testing to see how expensive the "if reader is a ByteReader don't read-ahead" ends up being. I'll come back with my results and we can see if it's worth doing future work on it or advising users to use Buffered()? Thank you |
Hi @immesys, welcome back! Hope you had a nice vacation! Thank you for looking into this and your insights. 👍
Yeah, my preference is also func (*Decoder) Buffered() io.Reader
That sounds great! I think |
Closing this for now (#412 solves the immediate problem). |
Description
In many of our use cases, we have a CBOR header in front of content that is not CBOR (e.g. ciphertext). It is convenient to be able to parse the "first item" using CBOR without having additional framing (such as an out-of-band length prefix in front of the CBOR header). This functionality exists in some other serialization libraries, usually in the form of a "remaining bytes" return value. E.g.
A recent PR (#380) clarifies that Unmarshal and Valid are really intended for just one object, so the above APIs are not quite a good fit. Instead, I propose that we add a function to the Decoder that lets you "stop early" and return a reader that points to the remaining bytes. If there is nothing buffered, that's just the upstream reader. But in the case where we have some buffered data we read ahead, then we need to prepend that onto the resulting stream.
The implementation was relatively minor, so was I skipped proposing in an issue, but please feel free to request that the problem is solved in a different way and I will happily reimplement.
Checklist (for code PR only, ignore for docs PR)
Last line of each commit message should be in this format:
Signed-off-by: Firstname Lastname [email protected]
(see next section).
Certify the Developer's Certificate of Origin 1.1
the Developer Certificate of Origin 1.1.