Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP requirements_common.txt: Upgrade Genshi==0.7.1 #1679

Closed
wants to merge 2 commits into from
Closed

WIP requirements_common.txt: Upgrade Genshi==0.7.1 #1679

wants to merge 2 commits into from

Conversation

cclauss
Copy link
Contributor

@cclauss cclauss commented Dec 6, 2018

https://pypi.org/project/Genshi is currently failing our build process on Python 3 because of a syntax error. Upgrading to this Python 3 compatible version of Genshi has been attempted before in #1454 but had to be reverted because our Python 2 tests failed on this upgrade.

https://pypi.org/project/Genshi/ is currently [failing our build process](https://travis-ci.org/internetarchive/openlibrary/jobs/464153243#L786) on Python 3 because of a syntax error.  Upgrading to this Python 3 compatible version of Genshi has been attempted before in #1454 but had to be reverted because our tests failed on this change.
@tfmorris tfmorris changed the title requirements_common.txt: Upgrade Genshi==0.7.1 WIP requirements_common.txt: Upgrade Genshi==0.7.1 Dec 6, 2018
@tfmorris
Copy link
Contributor

tfmorris commented Dec 6, 2018

I looked at this as part of the last mega review and my initial impression was that the tests just need to be updated. Genshi is now returning a full HTML doc, instead of just the surrounding HTML snippet.

If the full page is being sanitized in the production code, this should be an issue, but if it's just doing a snippet and then trying to integrate it into the rest of the page, the code may need to get modified. INOW, it may be more than just a test update, but I hope not.

Volunteer needed!

@tfmorris
Copy link
Contributor

The extra html and body tags were a red herring. They were getting added by BeautifulSoup on the error fallback path. The root problem is that the default encoding got changed from utf-8 to None: https://genshi.edgewall.org/wiki/Documentation/upgrade.html#upgrading-from-genshi-0-6-x-to-genshi-0-7-x

Bytes vs strings is a particularly aggravating part of the Python 3 porting and I don't know how much work has been done in this area. The tests here do assert(helpers.sanitize("foo") == "foo") which throws an exception because "foo" represents bytes in Python 3 and there's no encoding give.

Options are:

  1. Restore the old default and add encoding = 'utf-8' on the calls.
  2. Change the test suite to use use Unicode strings. ie assert(helpers.sanitize(u"foo") == u"foo")

The first is easy and isolated, but may mask problems elsewhere in the system.
The second is a little more work and will get the tests to pass, but may expose problems elsewhere in the system at run time if there are code paths which try to pass bytes instead of strings.

For the second case, we probably also want to change the exception handling so that it doesn't swallow Unicode decoding errors.

Opinions?

@cclauss
Copy link
Contributor Author

cclauss commented Dec 20, 2018

Closed in favor of #1721

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants