-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
email.policy.EmailPolicy._fold() breaking multi-byte Unicode sequences #117313
Labels
3.12
bugs and security fixes
3.13
bugs and security fixes
topic-email
topic-unicode
type-bug
An unexpected behavior, bug, or error
Comments
serhiy-storchaka
added
topic-unicode
topic-email
type-bug
An unexpected behavior, bug, or error
3.11
only security fixes
3.12
bugs and security fixes
3.13
bugs and security fixes
labels
Mar 28, 2024
serhiy-storchaka
added a commit
to serhiy-storchaka/cpython
that referenced
this issue
Mar 29, 2024
…d line separators Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email messages. Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e' and Unicode line separators '\x85', '\u2028' and '\u2029' as is.
serhiy-storchaka
added a commit
that referenced
this issue
Apr 17, 2024
… separators (GH-117369) Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email messages. Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e' and Unicode line separators '\x85', '\u2028' and '\u2029' as is.
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this issue
Apr 17, 2024
…d line separators (pythonGH-117369) Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email messages. Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e' and Unicode line separators '\x85', '\u2028' and '\u2029' as is. (cherry picked from commit aec1dac) Co-authored-by: Serhiy Storchaka <[email protected]>
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this issue
Apr 17, 2024
…d line separators (pythonGH-117369) Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email messages. Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e' and Unicode line separators '\x85', '\u2028' and '\u2029' as is. (cherry picked from commit aec1dac) Co-authored-by: Serhiy Storchaka <[email protected]>
diegorusso
pushed a commit
to diegorusso/cpython
that referenced
this issue
Apr 17, 2024
…d line separators (pythonGH-117369) Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email messages. Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e' and Unicode line separators '\x85', '\u2028' and '\u2029' as is.
serhiy-storchaka
added a commit
that referenced
this issue
Apr 17, 2024
…rd line separators (GH-117369) (GH-117971) Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email messages. Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e' and Unicode line separators '\x85', '\u2028' and '\u2029' as is. (cherry picked from commit aec1dac) Co-authored-by: Serhiy Storchaka <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
3.12
bugs and security fixes
3.13
bugs and security fixes
topic-email
topic-unicode
type-bug
An unexpected behavior, bug, or error
cpython/Lib/email/policy.py
Line 208 in eefff68
I think it's problematic that the method
email.policy.EmailPolicy._fold()
relies on the genericstr
/bytes
method.splitlines()
, especially in an email-processing context where the "official" line ending is\r\n
.I'm one of many devs who also leniently recognise (regex)
[\r\n]+
as a line break in emails. But I have no idea why all the other ending characters from other contexts are also used in a specific mail-manipulation context.On the surface,
.splitlines()
seems a simple way to cover the case of a header value itself containing line endings.However, in cases where a header value may contain multi-byte Unicode sequences, this causes breakage, because characters such as
\x0C
(which may potentially be part of a sequence) instead get treated as legacy ASCII 'form-feed', and deemed to be a line ending. This then breaks the sequence, which in turn, causes problems in the subsequent processing of the email message.A specimen header (from real-world production traffic) which triggers this behaviour is:
Here, the
\x0C
is treated as a line-ending, so the trailing portionb'\xd8/FTEP'
gets wrapped and indented on the next line.To work around this in my networks, I've had to subclass
email.policy.EmailPolicy
, and override the method._fold()
to instead split only on CR/LFs, viaCan the maintainers of this class please advise with their thoughts?
Given that RFC822 and related standards specify that the "official" line ending is
\r\n
, is there any reason to catch everything else that may also be considered in other string contexts to constitute a line ending?Linked PRs
The text was updated successfully, but these errors were encountered: