Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding book page has problems with Unicode ÄÖÜäöü or ß #3704

Closed
bitnapper opened this issue Aug 14, 2020 · 12 comments · Fixed by #4131
Closed

Adding book page has problems with Unicode ÄÖÜäöü or ß #3704

bitnapper opened this issue Aug 14, 2020 · 12 comments · Fixed by #4131
Assignees
Labels
Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] Priority: 2 Important, as time permits. [managed] Theme: Internationalization Making OpenLibrary work for both foreign-language users and books. [managed] Type: Bug Something isn't working. [managed]

Comments

@bitnapper
Copy link

Adding books containing a german ß in title oder ÄÖÜäöü or ß in the author name results in an error.

Evidence / Screenshot (if possible)

/opt/openlibrary/deploys/openlibrary/61096f2/openlibrary/templates/books/edit.html: error in processing template: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 9: ordinal not in range(128) (falling back to default template)

Relevant url?

https://openlibrary.org/books/add

Steps to Reproduce

  1. https://openlibrary.org/books/add
  2. add title 'Der Große Weltatlas'

Details

  • Logged in: Y
  • Browser type/version? Firefox 79
  • Operating system? OSX
  • Environment (prod/dev/local)? prod
@bitnapper bitnapper added Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Type: Bug Something isn't working. [managed] labels Aug 14, 2020
@bitnapper
Copy link
Author

Trying to edit the book later, seems to produce the following error:

https://openlibrary.org/books/OL28738926M/Das_gro%C3%9Fe_Buch_der_Dinosaurier/edit

/opt/openlibrary/deploys/openlibrary/61096f2/openlibrary/templates/books/edit.html: error in processing template: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 23: ordinal not in range(128) (falling back to default template)

@LeadSongDog
Copy link

LeadSongDog commented Aug 14, 2020

This is a recent Variation on an old problem, also discussed en passant at #2231. As a temporary workaround, the problematic correct spelling can be moved to the author’s a.k.a.s and the primary author name respelled to use plain ASCII. This done, the editions and works become accessible.
@tabshaikh @cclauss Does this ring any bells?

@bitnapper
Copy link
Author

@LeadSongDog This workaround helps with creating new books but OL28738926M can't be edited at all.

@cclauss
Copy link
Contributor

cclauss commented Aug 14, 2020

This kinda incompatibility should disappear in the coming months when we switch to Python 3 because all str are Unicode in current versions of Python. It would be cool if someone could write some Python test cases that fail for this issue so that once they pass, we know we have made useful progress. Even a list of URLs would help especially if we could get some of those books loaded into dev/staging.

@xayhewalo xayhewalo added Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] Priority: 2 Important, as time permits. [managed] Theme: Internationalization Making OpenLibrary work for both foreign-language users and books. [managed] Theme: Upgrade to Python 3 and removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] labels Aug 18, 2020
@dcapillae
Copy link
Contributor

dcapillae commented Aug 23, 2020

I found this bug in names with "á", "é", "í", "ó", and "ú" (letters with accents). Also with the letter "ñ" in Spanish. Example: Isaías Rojas Peña (including "í" and "ñ").

@cclauss
Copy link
Contributor

cclauss commented Aug 23, 2020

Can we replicate this on http://staging.openlibrary.org which is now running on Python 3?

@cclauss
Copy link
Contributor

cclauss commented Aug 25, 2020

https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/upstream/addbook.py

This also presents a problem on Python 3.8.5 because we are attempting str.encode('ascii') so I will try to fix in a way that work for both...

Traceback (most recent call last):
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 135, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 182, in handle_request
    resp.write(item)
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 333, in write
    self.send_headers()
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 329, in send_headers
    util.write(self.sock, util.to_bytestring(header_str, "ascii"))
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/util.py", line 507, in to_bytestring
    return value.encode(encoding)
UnicodeEncodeError: 'ascii' codec can't encode character '\xdf' in position 248: ordinal not in range(128)

@cclauss cclauss changed the title Adding book page has problems with german ÄÖÜäöü or ß Adding book page has problems with Unicode ÄÖÜäöü or ß Aug 25, 2020
@LeadSongDog
Copy link

@cclauss I got an internal server error on staging.openlibrary.org adding that:
2B12909F-BB37-459E-80DE-1FAF654BF44E

Changing to grosse instead worked:
76F67567-E688-4F32-AAD8-D2B1972DA390

@cclauss
Copy link
Contributor

cclauss commented Aug 25, 2020

Correct... This is about Unicode characters vs. Ascii characters. Any Unicode character (even an emoji) will cause a problem on either Py2 or Py3.

@rodrigoescandon
Copy link

Hi, I'm getting a similar error when I try to edit this book:

https://openlibrary.org/books/OL9174734M/Biblioteca_Vasconcelos_Library

The error:

/opt/openlibrary/deploys/openlibrary/2b017b5/openlibrary/templates/books/edit.html: error in processing template: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 11: ordinal not in range(128) (falling back to default template)

@cclauss
Copy link
Contributor

cclauss commented Aug 27, 2020

class OLIndexer() has a method normalize_edition_title() that seems to be converting Unicode titles into ASCII titles. ;-(

@bitnapper
Copy link
Author

The Problem also occours whent trying to edit an existing book where the authors name contains certain characters:

@cclauss cclauss added Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] and removed Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] labels Oct 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] Priority: 2 Important, as time permits. [managed] Theme: Internationalization Making OpenLibrary work for both foreign-language users and books. [managed] Type: Bug Something isn't working. [managed]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants