-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
don't mix unicode strings with utf8-in-byte-strings #3470
Conversation
otherwise we explode with: ``` Traceback (most recent call last): File /usr/lib/python2.7/logging/handlers.py, line 78, in emit logging.FileHandler.emit(self, record) File /usr/lib/python2.7/logging/__init__.py, line 950, in emit StreamHandler.emit(self, record) File /usr/lib/python2.7/logging/__init__.py, line 887, in emit self.handleError(record) File /usr/lib/python2.7/logging/__init__.py, line 810, in handleError None, sys.stderr) File /usr/lib/python2.7/traceback.py, line 124, in print_exception _print(file, 'Traceback (most recent call last):') File /usr/lib/python2.7/traceback.py, line 13, in _print file.write(str+terminator) File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_io.py, line 170, in write self.log.emit(self.level, format=u{log_io}, log_io=line) File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 144, in emit self.observer(event) File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 136, in __call__ errorLogger = self._errorLoggerForObserver(brokenObserver) File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 156, in _errorLoggerForObserver if obs is not observer File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 81, in __init__ self.log = Logger(observer=self) File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 64, in __init__ namespace = self._namespaceFromCallingContext() File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 42, in _namespaceFromCallingContext return currentframe(2).f_globals[__name__] File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/python/compat.py, line 93, in currentframe for x in range(n + 1): RuntimeError: maximum recursion depth exceeded while calling a Python object Logged from file site.py, line 129 File /usr/lib/python2.7/logging/__init__.py, line 859, in emit msg = self.format(record) File /usr/lib/python2.7/logging/__init__.py, line 732, in format return fmt.format(record) File /usr/lib/python2.7/logging/__init__.py, line 471, in format record.message = record.getMessage() File /usr/lib/python2.7/logging/__init__.py, line 335, in getMessage msg = msg % self.args UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128) Logged from file site.py, line 129 ``` ...where the logger apparently recurses whilst trying to log the error, hitting the maximum recursion depth and killing everything badly.
synapse/http/site.py
Outdated
# c.f. https://github.com/matrix-org/synapse/issues/3471 | ||
user_agent = self.get_user_agent() | ||
if user_agent is not None: | ||
user_agent = user_agent.decode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this behave when the UA is not valid utf-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it throws:
>>> a="\xa9"
>>> a.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: invalid start byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is a better bet:
>>> a.decode("utf-8", "replace")
u'\ufffd'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
[it is surprising to me that |
i think it's bytes when it describes a server (as it comes from the raw Auth header) versus unicode when it describes a client. but i didn't dig too deep, especially as i don't understand the circumstances where py2 upgrades byte to unicode strings. |
weeps |
relatedly: continuing to do this sort of hackery without writing tests will just doom us to repeat the same lessons over and over again. |
otherwise we explode with:
...where the logger apparently recurses whilst trying to log the error, hitting the
maximum recursion depth and killing everything badly.