-
Notifications
You must be signed in to change notification settings - Fork 2
No support for non-UTF8 parsing #5
Comments
So I'm inclined to say that the ergonomics of this are maybe up to users of this crate to handle. Then this crate can focus on providing a correct protocol implementation (using bytes wherever needed), and you can for example have This is kind of predicated on the notion that non-UTF-8 data will really only be a problem for message contents. Have you actually seen this be a problem for header data? It also relies a little bit on my perception that using the message parsing parts of IMAP should mostly be avoided in favor of handling message parsing in the client application... Some parts of IMAP really seem like a layering violation that way. |
@djc:
|
Yeah, it's the non-UTF-8 headers that are the issue (in part because not all servers support UTF-8 up-conversion). We could comb through the specs to find which fields are safe to always parse as unicode strings, and which aren't, but it seems better (at least to me) to just have the user choose how to interpret text-like For what it's worth, I completely agree with you that if headers were all valid unicode, then this library should just use |
@jonhoo: In my mind, we can just use bytes everywhere and add the UTF-8 API later. I think correctness and completeness of implementation is more important in this early stage than convenience. I'd like us to starting filing issues for missing implementations and bugs, and I'll starting working on them! |
@sanmai-NL RFC 5738 is an extension, so that in my understanding a client should not have to support it. As for the boundary between this crate and something like |
This fixes djc#5 as proposed in that issue. Specifically, it introduces a trait, `FromByteResponse`, and provides a generic `parse_response` function that parses input as raw byte strings, and then maps them using some implementation of that trait. This in turns allows users to choose how they want to parse byte sequences in an ergonomic way. Under the hood, this is *slightly* less efficient that it could be. Specifically, it parses everything as `&[u8]` first, and then maps everything using calls to `FromByteResponse`. This works fine, but will cause unnecessary re-allocation of vectors inside a bunch of structs. The way to work around this is to have `nom` directly use a generic return value in all its parsers, but this unfortunately doesn't seem to be supported at the time of writing.
This fixes djc#5 as proposed in that issue. Specifically, it introduces a trait, `FromByteResponse`, and provides a generic `parse_response` function that parses input as raw byte strings, and then maps them using some implementation of that trait. This in turns allows users to choose how they want to parse byte sequences in an ergonomic way. Under the hood, this is *slightly* less efficient that it could be. Specifically, it parses everything as `&[u8]` first, and then maps everything using calls to `FromByteResponse`. This works fine, but will cause unnecessary re-allocation of vectors inside a bunch of structs. The way to work around this is to have `nom` directly use a generic return value in all its parsers, but this unfortunately doesn't seem to be supported at the time of writing.
Interested readers should also see PR #12, and the reason for its closing. |
It is (unfortunately) not entirely uncommon for IMAP servers to not support UTF-8 (see mattnenterprise/rust-imap#54 for more), but
imap-proto
currently returns&str
all over the place, which necessarily implies that it only works on UTF-8 response. I think the right thing to do here (although it makes the types a bit less pleasant) is to parameterize all the types by aT: TryFrom<&[u8]>
. That way, users can choose whether to parse intostr
, or simply keep things as&[u8]
(or do some other decoding). Unfortunately,TryFrom
is still nightly-only, and hasn't even landed yet for strings (rust-lang/rust#44916), so in the meantime, you'd probably just want to add a trait like this:And then make the various types use this trait. For example, for
MailboxDatum
:The text was updated successfully, but these errors were encountered: