Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115398: Expose Expat >=2.6.0 reparse deferral API (CVE-2023-52425) #115623

Merged
merged 39 commits into from
Feb 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7cebe78
pyexpat: Implement methods pyexpat.xmlparser.(Get|Set)ReparseDeferral…
hartwork Feb 17, 2024
c70fbae
etree: Implement method xml.etree.ElementTree.XMLParser.flush (Python…
hartwork Feb 17, 2024
dfca819
pyexpat: Make SetReparseDeferralEnabled available via PyExpat_CAPI
hartwork Feb 7, 2024
4baab67
etree: Implement method xml.etree.ElementTree.XMLParser.flush (C vers…
hartwork Feb 18, 2024
7928942
etree: Implement method xml.etree.ElementTree.XMLPullParser.flush
hartwork Feb 17, 2024
e5e4033
etree: Use XMLPullParser.flush to fix XMLPullParserTest for Expat 2.6.0
hartwork Feb 17, 2024
bc6e1a7
sax: Implement method xml.sax.expatreader.ExpatParser.flush
hartwork Feb 18, 2024
b737f03
sax: Test method xml.sax.expatreader.ExpatParser.flush
hartwork Feb 18, 2024
850e46d
Document new CVE-2023-52425 Expat API (reparse deferral)
hartwork Feb 18, 2024
7002024
_elementtree.c: Document how we know that reparse deferral is enabled
hartwork Feb 21, 2024
2132dfe
sax: Fix xml.sax.expatreader.ExpatParser.flush
hartwork Feb 21, 2024
3d02dfe
etree: Fix xml.etree.ElementTree.XMLParser.flush (Python version)
hartwork Feb 21, 2024
fdd2fac
etree: Fix typo "deferall"
hartwork Feb 21, 2024
dbbd98c
pyexpat: Cover (Get|Set)ReparseDeferralEnabled by tests
hartwork Feb 24, 2024
3b6ea39
sax: Extend xml.sax.expatreader.ExpatParser.flush test coverage
hartwork Feb 24, 2024
5c1cfb7
Doc/whatsnew/3.13.rst: Mention new Expat reparse deferral API
hartwork Feb 24, 2024
a9c666e
pyexpat: Document methods pyexpat.xmlparser.(Get|Set)ReparseDeferralE…
hartwork Feb 24, 2024
35099e3
etree: Document method xml.etree.ElementTree.XMLParser.flush
hartwork Feb 24, 2024
1496e83
etree: Document method xml.etree.ElementTree.XMLPullParser.flush
hartwork Feb 24, 2024
a6927ff
etree: Make docs point to xml.etree.ElementTree.XMLPullParser.flush
hartwork Feb 24, 2024
c5b2159
pyexpat: Move security warning into SetReparseDeferralEnabled docs
hartwork Feb 24, 2024
082bcc1
pyexpat|etree: Mark new Expat API as added in 3.13 in docs
hartwork Feb 24, 2024
f0577e7
pyexpat|sax: Do not be silent about tests skipped for Expat <2.6.0
hartwork Feb 24, 2024
f589908
pyexpat: Simplify test ReparseDeferralTest.test_getter_setter_round_trip
hartwork Feb 24, 2024
4915045
Promote xml.parsers.expat.xmlparser instead of pyexpat.xmlparser
hartwork Feb 24, 2024
d0ed243
etree: Cover method xml.etree.ElementTree.XMLPullParser.flush
hartwork Feb 24, 2024
62e4fd7
pyexpat: Cut whitespace from ReparseDeferralTest.test_getter_setter_r…
hartwork Feb 24, 2024
b0058d5
pyexpat: Drop ReparseDeferralTest.test_getter_initial_value
hartwork Feb 24, 2024
1f70c09
pyexpat: Break a long line for PEP 8
hartwork Feb 24, 2024
a77de0f
Doc/whatsnew/3.13.rst: Do not create a link into undocumented class
hartwork Feb 24, 2024
2f07457
etree: Make XMLPullParserTest._feed only flush when needed
hartwork Feb 24, 2024
4b49de9
etree: Fix XMLPullParserTest.test_flush_[..] for C version
hartwork Feb 24, 2024
3c960a6
etree: Break a long line for PEP 8
hartwork Feb 24, 2024
4855bb9
etree: Make test_flush_reparse_deferral_disabled less exclusive
hartwork Feb 24, 2024
b6a84b2
etree|sax: Simplify .flush implementations
hartwork Feb 24, 2024
0faa19e
etree: Resolve "is_python" in favor of "ET is pyET"
hartwork Feb 24, 2024
40743a6
etree: Fix emphasis syntax for "immediate" in docs
hartwork Feb 24, 2024
a473299
pypexpat: Replace "none" with "NULL" to be correct
hartwork Feb 24, 2024
a6baa0b
pyexpat: Indent warning about xmlparser.SetReparseDeferralEnabled
hartwork Feb 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions Doc/library/pyexpat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,38 @@
:exc:`ExpatError` to be raised with the :attr:`code` attribute set to
``errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]``.

.. method:: xmlparser.SetReparseDeferralEnabled(enabled)

.. warning::

Calling ``SetReparseDeferralEnabled(False)`` has security implications,
as detailed below; please make sure to understand these consequences
prior to using the ``SetReparseDeferralEnabled`` method.

Expat 2.6.0 introduced a security mechanism called "reparse deferral"
where instead of causing denial of service through quadratic runtime
from reparsing large tokens, reparsing of unfinished tokens is now delayed
by default until a sufficient amount of input is reached.
Due to this delay, registered handlers may — depending of the sizing of
input chunks pushed to Expat — no longer be called right after pushing new
input to the parser. Where immediate feedback and taking over responsiblity
of protecting against denial of service from large tokens are both wanted,
calling ``SetReparseDeferralEnabled(False)`` disables reparse deferral
for the current Expat parser instance, temporarily or altogether.
Calling ``SetReparseDeferralEnabled(True)`` allows re-enabling reparse
deferral.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add versionadded directives for all new methods. Since they are purposed to be backported, I think that you can use .. versionadded:: 3.12.3. In backports to 3.11 and earlier it will be changed.

Copy link
Contributor Author

@hartwork hartwork Feb 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-storchaka I already had that on the radar for later today, will do. I'm unsure how 3.12 should be special here, I'd assume we start out with 3.13 and then adjust during backports for all other 3.x branches. Could you help me understand how 3.12.3 would help over 3.13 when targeting main here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-storchaka PS: 4x .. versionadded:: 3.13 added now for a start

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature will be supported in the continuous interval "3.12.3-" of Python releases. If the user sees versionadded:: 3.12.3, they can understand that in can be used in 3.12 (unless they have not updated it) without checking the 3.12 documentation. It is also a hint that it could be backported in 3,11 and older versions. If the user sees versionadded:: 3.13, there is no reason to suppose that this feature exist in older releases.

It may be only my personal opinion, but I think that this variant is better. But 3.13 is a good start, and we can discuss this later.

.. versionadded:: 3.13

.. method:: xmlparser.GetReparseDeferralEnabled()

Returns whether reparse deferral is currently enabled for the given
Expat parser instance.

.. versionadded:: 3.13


:class:`xmlparser` objects have the following attributes:

Check warning on line 230 in Doc/library/pyexpat.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:class reference target not found: xmlparser


.. attribute:: xmlparser.buffer_size
Expand Down
29 changes: 29 additions & 0 deletions Doc/library/xml.etree.elementtree.rst
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,11 @@ data but would still like to have incremental parsing capabilities, take a look
at :func:`iterparse`. It can be useful when you're reading a large XML document
and don't want to hold it wholly in memory.

Where *immediate* feedback through events is wanted, calling method
:meth:`XMLPullParser.flush` can help reduce delay;
please make sure to study the related security notes.


Finding interesting elements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -1387,6 +1392,19 @@ XMLParser Objects

Feeds data to the parser. *data* is encoded data.


.. method:: flush()

Triggers parsing of any previously fed unparsed data, which can be
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
The implementation of :meth:`flush` temporarily disables reparse deferral
with Expat (if currently enabled) and triggers a reparse.
Disabling reparse deferral has security consequences; please see
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.

.. versionadded:: 3.13


:meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
for each opening tag, its ``end(tag)`` method for each closing tag, and data
is processed by method ``data(data)``. For further supported callback
Expand Down Expand Up @@ -1448,6 +1466,17 @@ XMLPullParser Objects

Feed the given bytes data to the parser.

.. method:: flush()

Triggers parsing of any previously fed unparsed data, which can be
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
The implementation of :meth:`flush` temporarily disables reparse deferral
with Expat (if currently enabled) and triggers a reparse.
Disabling reparse deferral has security consequences; please see
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.

.. versionadded:: 3.13

.. method:: close()

Signal the parser that the data stream is terminated. Unlike
Expand Down
11 changes: 11 additions & 0 deletions Doc/whatsnew/3.13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,17 @@ Other Language Changes

(Contributed by Victor Stinner in :gh:`114570`.)

* Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425)
by adding five new methods:

* :meth:`xml.etree.ElementTree.XMLParser.flush`
* :meth:`xml.etree.ElementTree.XMLPullParser.flush`
* :meth:`xml.parsers.expat.xmlparser.GetReparseDeferralEnabled`
* :meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled`
* :meth:`!xml.sax.expatreader.ExpatParser.flush`

(Contributed by Sebastian Pipping in :gh:`115623`.)


New Modules
===========
Expand Down
4 changes: 3 additions & 1 deletion Include/pyexpat.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,10 @@ struct PyExpat_CAPI
enum XML_Status (*SetEncoding)(XML_Parser parser, const XML_Char *encoding);
int (*DefaultUnknownEncodingHandler)(
void *encodingHandlerData, const XML_Char *name, XML_Encoding *info);
/* might be none for expat < 2.1.0 */
/* might be NULL for expat < 2.1.0 */
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
/* might be NULL for expat < 2.6.0 */
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
/* always add new stuff to the end! */
};

54 changes: 54 additions & 0 deletions Lib/test/test_pyexpat.py
Original file line number Diff line number Diff line change
Expand Up @@ -755,5 +755,59 @@ def resolve_entity(context, base, system_id, public_id):
self.assertEqual(handler_call_args, [("bar", "baz")])


class ReparseDeferralTest(unittest.TestCase):
def test_getter_setter_round_trip(self):
parser = expat.ParserCreate()
enabled = (expat.version_info >= (2, 6, 0))

hartwork marked this conversation as resolved.
Show resolved Hide resolved
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
parser.SetReparseDeferralEnabled(False)
self.assertIs(parser.GetReparseDeferralEnabled(), False)
parser.SetReparseDeferralEnabled(True)
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)

def test_reparse_deferral_enabled(self):
if expat.version_info < (2, 6, 0):
self.skipTest(f'Expat {expat.version_info} does not '
'support reparse deferral')

started = []

def start_element(name, _):
started.append(name)

parser = expat.ParserCreate()
parser.StartElementHandler = start_element
self.assertTrue(parser.GetReparseDeferralEnabled())

for chunk in (b'<doc', b'/>'):
parser.Parse(chunk, False)

# The key test: Have handlers already fired? Expecting: no.
self.assertEqual(started, [])

parser.Parse(b'', True)

self.assertEqual(started, ['doc'])

def test_reparse_deferral_disabled(self):
started = []

def start_element(name, _):
started.append(name)

parser = expat.ParserCreate()
parser.StartElementHandler = start_element
if expat.version_info >= (2, 6, 0):
parser.SetReparseDeferralEnabled(False)
self.assertFalse(parser.GetReparseDeferralEnabled())

for chunk in (b'<doc', b'/>'):
parser.Parse(chunk, False)

# The key test: Have handlers already fired? Expecting: yes.
self.assertEqual(started, ['doc'])


if __name__ == "__main__":
unittest.main()
51 changes: 51 additions & 0 deletions Lib/test/test_sax.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from io import BytesIO, StringIO
import codecs
import os.path
import pyexpat
import shutil
import sys
from urllib.error import URLError
Expand Down Expand Up @@ -1214,6 +1215,56 @@ def test_expat_incremental_reset(self):

self.assertEqual(result.getvalue(), start + b"<doc>text</doc>")

def test_flush_reparse_deferral_enabled(self):
if pyexpat.version_info < (2, 6, 0):
self.skipTest(f'Expat {pyexpat.version_info} does not support reparse deferral')

result = BytesIO()
xmlgen = XMLGenerator(result)
parser = create_parser()
parser.setContentHandler(xmlgen)

for chunk in ("<doc", ">"):
parser.feed(chunk)

self.assertEqual(result.getvalue(), start) # i.e. no elements started
self.assertTrue(parser._parser.GetReparseDeferralEnabled())

parser.flush()

self.assertTrue(parser._parser.GetReparseDeferralEnabled())
self.assertEqual(result.getvalue(), start + b"<doc>")

parser.feed("</doc>")
parser.close()

self.assertEqual(result.getvalue(), start + b"<doc></doc>")

def test_flush_reparse_deferral_disabled(self):
result = BytesIO()
xmlgen = XMLGenerator(result)
parser = create_parser()
parser.setContentHandler(xmlgen)

for chunk in ("<doc", ">"):
parser.feed(chunk)

if pyexpat.version_info >= (2, 6, 0):
parser._parser.SetReparseDeferralEnabled(False)

self.assertEqual(result.getvalue(), start) # i.e. no elements started
self.assertFalse(parser._parser.GetReparseDeferralEnabled())

parser.flush()

self.assertFalse(parser._parser.GetReparseDeferralEnabled())
self.assertEqual(result.getvalue(), start + b"<doc>")

parser.feed("</doc>")
parser.close()

self.assertEqual(result.getvalue(), start + b"<doc></doc>")

# ===== Locator support

def test_expat_locator_noinfo(self):
Expand Down
79 changes: 63 additions & 16 deletions Lib/test/test_xml_etree.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,6 @@
</foo>
"""

fails_with_expat_2_6_0 = (unittest.expectedFailure
if pyexpat.version_info >= (2, 6, 0) else
lambda test: test)

def checkwarnings(*filters, quiet=False):
def decorator(test):
def newtest(*args, **kwargs):
Expand Down Expand Up @@ -1462,12 +1458,14 @@ def test_attlist_default(self):

class XMLPullParserTest(unittest.TestCase):

def _feed(self, parser, data, chunk_size=None):
def _feed(self, parser, data, chunk_size=None, flush=False):
if chunk_size is None:
parser.feed(data)
else:
for i in range(0, len(data), chunk_size):
parser.feed(data[i:i+chunk_size])
if flush:
parser.flush()

def assert_events(self, parser, expected, max_events=None):
self.assertEqual(
Expand All @@ -1485,34 +1483,32 @@ def assert_event_tags(self, parser, expected, max_events=None):
self.assertEqual([(action, elem.tag) for action, elem in events],
expected)

def test_simple_xml(self, chunk_size=None):
def test_simple_xml(self, chunk_size=None, flush=False):
parser = ET.XMLPullParser()
self.assert_event_tags(parser, [])
self._feed(parser, "<!-- comment -->\n", chunk_size)
self._feed(parser, "<!-- comment -->\n", chunk_size, flush)
self.assert_event_tags(parser, [])
self._feed(parser,
"<root>\n <element key='value'>text</element",
chunk_size)
chunk_size, flush)
self.assert_event_tags(parser, [])
self._feed(parser, ">\n", chunk_size)
self._feed(parser, ">\n", chunk_size, flush)
self.assert_event_tags(parser, [('end', 'element')])
self._feed(parser, "<element>text</element>tail\n", chunk_size)
self._feed(parser, "<empty-element/>\n", chunk_size)
self._feed(parser, "<element>text</element>tail\n", chunk_size, flush)
self._feed(parser, "<empty-element/>\n", chunk_size, flush)
self.assert_event_tags(parser, [
('end', 'element'),
('end', 'empty-element'),
])
self._feed(parser, "</root>\n", chunk_size)
self._feed(parser, "</root>\n", chunk_size, flush)
self.assert_event_tags(parser, [('end', 'root')])
self.assertIsNone(parser.close())

@fails_with_expat_2_6_0
def test_simple_xml_chunk_1(self):
self.test_simple_xml(chunk_size=1)
self.test_simple_xml(chunk_size=1, flush=True)

@fails_with_expat_2_6_0
def test_simple_xml_chunk_5(self):
self.test_simple_xml(chunk_size=5)
self.test_simple_xml(chunk_size=5, flush=True)

def test_simple_xml_chunk_22(self):
self.test_simple_xml(chunk_size=22)
Expand Down Expand Up @@ -1711,6 +1707,57 @@ def test_unknown_event(self):
with self.assertRaises(ValueError):
ET.XMLPullParser(events=('start', 'end', 'bogus'))

def test_flush_reparse_deferral_enabled(self):
if pyexpat.version_info < (2, 6, 0):
self.skipTest(f'Expat {pyexpat.version_info} does not '
'support reparse deferral')

parser = ET.XMLPullParser(events=('start', 'end'))

for chunk in ("<doc", ">"):
parser.feed(chunk)

self.assert_event_tags(parser, []) # i.e. no elements started
if ET is pyET:
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())

parser.flush()

self.assert_event_tags(parser, [('start', 'doc')])
if ET is pyET:
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())

parser.feed("</doc>")
parser.close()

self.assert_event_tags(parser, [('end', 'doc')])

def test_flush_reparse_deferral_disabled(self):
parser = ET.XMLPullParser(events=('start', 'end'))

for chunk in ("<doc", ">"):
parser.feed(chunk)

if pyexpat.version_info >= (2, 6, 0):
if not ET is pyET:
self.skipTest(f'XMLParser.(Get|Set)ReparseDeferralEnabled '
'methods not available in C')
parser._parser._parser.SetReparseDeferralEnabled(False)

self.assert_event_tags(parser, []) # i.e. no elements started
if ET is pyET:
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())

parser.flush()

self.assert_event_tags(parser, [('start', 'doc')])
if ET is pyET:
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())

parser.feed("</doc>")
parser.close()

self.assert_event_tags(parser, [('end', 'doc')])

#
# xinclude tests (samples from appendix C of the xinclude specification)
Expand Down
14 changes: 14 additions & 0 deletions Lib/xml/etree/ElementTree.py
Original file line number Diff line number Diff line change
Expand Up @@ -1320,6 +1320,11 @@ def read_events(self):
else:
yield event

def flush(self):
if self._parser is None:
raise ValueError("flush() called after end of stream")
self._parser.flush()


def XML(text, parser=None):
"""Parse XML document from string constant.
Expand Down Expand Up @@ -1726,6 +1731,15 @@ def close(self):
del self.parser, self._parser
del self.target, self._target

def flush(self):
was_enabled = self.parser.GetReparseDeferralEnabled()
try:
self.parser.SetReparseDeferralEnabled(False)
self.parser.Parse(b"", False)
except self._error as v:
self._raiseerror(v)
finally:
self.parser.SetReparseDeferralEnabled(was_enabled)

# --------------------------------------------------------------------
# C14N 2.0
Expand Down
Loading
Loading