-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
email.policy.default - gotcha with re-using parsed headers with embedded newlines #121650
Comments
On further investigation, a plain string with a trailing newline has this issue:
So the "re-use parsed header" is not part of the issue. The problem might be the newline detection in Lines 131 to 148 in dc03ce7
A single element list is returned by |
This is a bug (I was able to reproduce this on the CPython main branch), and looks like a minor security problem, considering this:
For example, I could see someone developing an app that does something like this: def email_notification(name: str):
msg = EmailMessage()
msg.set_content("This is an automatic notification blah blah blah...")
msg["Subject"] = (
f"{name} sent you a message!"
)
smtp_server.send_message(msg) If a user set their name to something like Furthermore, you could use this to inject extra message headers. |
It seems to be a bug, or two even. msg = email.message.EmailMessage(policy=default)
msg['Subject'] = 'A 💩 subject\nBcc: [email protected]'
print(str(msg)) The above throws a ValueError("Header values may not contain linefeed or carriage return characters"), as expected. However the following does not, and inserts an extra newline, thus invalidating some headers: msg = email.message.EmailMessage(policy=default)
msg['Subject'] = 'A 💩 subject\n'
msg.set_content('This is 💩 the body of the message.\n')
print(str(msg)) and by using an utf8 encoded newline, it even inserts an extra header msg = email.message.EmailMessage(policy=default)
msg['Subject'] = 'A 💩 subject=?UTF-8?Q?=0A?=Bcc: [email protected]'
msg.set_content('This is 💩 the body of the message.\n')
print(str(msg)) . So, I think two things have to be solved:
@encukou : I'll try to fix both during (or after) the EuroPython sprint, ok? |
Thanks @basbloemsaat. Feel free to pick a better title for this issue (or suggest one if I need to change it), or re-file for the individual issues. |
I'm pretty sure this is a security problem, as you can inject extra headers. @Eclips4 what do you think, and could you add the security label? |
I would like to hear @serhiy-storchaka opinion on this. |
…H-122233) ## Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. ## Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ound (pythonGH-122233) GH-GH- Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. GH-GH- Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ound (pythonGH-122233) - Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. - Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> (cherry picked from commit 0976339)
…s are sound (pythonGH-122233) GH-GH- Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. GH-GH- Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
… are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
… are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…sound (GH-122233) (#122484) gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233) GH-GH- Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. GH-GH- Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…sound (GH-122233) (#122599) * gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233) - Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. - Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> (cherry picked from commit 0976339) * Document changes as made in 3.12.5
#121284 turns out to be a variation of this, where refolding a parsed RFC 2047 encoded-word can leak 'specials' characters into structured headers without proper quoting/encoding. The security issue is not quite as severe as letting newlines leak in, but unquoted specials can allow manipulation of the message sender and recipients. |
…s are sound pythongh-121650: Encode newlines in headers, and verify headers are sound (pythonGH-122233) Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…s are sound Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
headers are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> This patch also contains modified commit cherry picked from c5bba85. This commit was backported to simplify the backport of the other commit fixing CVE. The only modification is a removal of one test case which tests multiple changes in Python 3.7 and it wasn't working properly with Python 3.6 where we backported only one change. Co-authored-by: bsiem <[email protected]>
headers are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) This patch also contains modified commit cherry picked from c5bba85. This commit was backported to simplify the backport of the other commit fixing CVE. The only modification is a removal of one test case which tests multiple changes in Python 3.7 and it wasn't working properly with Python 3.6 where we backported only one change. Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: bsiem <[email protected]>
headers are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) This patch also contains modified commit cherry picked from c5bba85. This commit was backported to simplify the backport of the other commit fixing CVE. The only modification is a removal of one test case which tests multiple changes in Python 3.7 and it wasn't working properly with Python 3.6 where we backported only one change. Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: bsiem <[email protected]>
…ound (pythonGH-122233) ## Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. ## Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
The :mod:`~email.generator` will now refuse to serialize (write) headers that are improperly folded or delimited, such that they would be parsed as multiple headers or joined with adjacent data. If you need to turn this safety feature off, set `~email.policy.Policy.verify_generated_headers`. Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. Fixes: gh#python#121650 Fixes: bsc#1228780 (CVE-2024-6923) From-PR: gh#python/cpython!122233 Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Jakub Stasiak <[email protected]> Patch: CVE-2024-6923-email-hdr-inject.patch
…ound (GH-122233) (#122611) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…sound (GH-122233) (#122608) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. Verify that email headers are well-formed. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…sound (GH-122233) (#122609) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…ound (GH-122233) (#122610) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
Thank you @jwhitlock for the report, and @basbloemsaat for the initial fix! |
Bug report
Bug description:
I'm not sure if this is a bug, feature request, or user error. I'm happy to re-file once I know which
If a parsed email header contains a correctly quoted newline, setting an email header to that value will include a newline.
Output is:
An email parser will interpret the newline as the start of the message. In this case, the
Content-Type
and other MIME headers will not be processed, and the email treated as plain text. In other cases, required headers likeTo
may not be processed and the email will not be delivered.I'd expect an error on setting the value, an error on serializing the
EmailMessage
to a string, the subject to retain the original encoding, or the newline to be quoted in the serialized version.Now that we know the behavior, we can process the headers (embed or strip trailing newlines). However, you may see this is a bug, a needed feature, or missing documentation.
More info:
subject
's type is aemail.headerregistry._UniqueUnstructuredHeader
. It has aname
, so it is assigned without checking (email.policy.EmailPolicy.header_store_parse()
).The
_parse_tree
, returned byemail._header_value_parser.get_unstructured()
, is:A user encountered this for our email relaying service https://relay.firefox.com (mozilla/fx-private-relay#4841). An incoming email to a service address is matched to a user. We re-write the email headers and forward the email to the user's "real" address.
A real email has this subject header:
This is from a European website https://www.alloverpiercings.com. You can create a wishlist and send it to an email address. The subject appears correctly encoded to me, to allow for non-ASCII usernames, with the unfortunate embedded newline. When forwarding this email, using something similar to the code above (but with more header modifications and additions), the embedded newline is turned into a real newline. The rest of the email headers are treated as part of the body. Since the
Content-Type
and other MIME headers are not processed as headers, the email is treated as a plain text email.CPython versions tested on:
3.11, 3.12
Operating systems tested on:
macOS
Linked PRs
The text was updated successfully, but these errors were encountered: