Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display RFC 2047-encoded author names correctly in the sidebar of a package page #16496

Open
joaopalmeiro opened this issue Aug 16, 2024 · 3 comments
Labels
blocked Issues we can't or shouldn't get to yet feature request

Comments

@joaopalmeiro
Copy link

Hi! 👋

What's the problem this feature will solve?

By following the pyproject.toml specification and using build backends such as PDM-Backend (which makes use of the pyproject-metadata package), an author with a name with non-ASCII characters (e.g., João Palmeiro) and an email address is outputted as =?utf-8?q?Jo=C3=A3o_Palmeiro?= <[email protected]> for the Author-email core metadata field.

The Author-email core metadata field is used to populate the Author sidebar field for a package page.

The pyproject-metadata package leverages the email.utils.formataddr() function to process the values ​​of the authors field of the pyproject.toml file. This function encodes names following RFC 2047 if they have non-ASCII characters (the default charset is utf-8) and it is this value (e.g., =?utf-8?q?Jo=C3=A3o_Palmeiro?=) that is written to metadata files like PKG-INFO:

Metadata-Version: 2.1
Name: template-python-pdm-package
Version: 0.0.0
Summary: Opinionated Python + PDM template for new packages.
Author-Email: =?utf-8?q?Jo=C3=A3o_Palmeiro?= <[email protected]>
...

As a concrete example, check the FastAPI package, please:

image

Instead of Sebastián Ramírez, the author's name appears as =?utf-8?q?Sebasti=C3=A1n_Ram=C3=ADrez?=.

In my opinion, given that the specification talks about RFC-822 and using the email.utils.formataddr() function or the pyproject-metadata package in build backends (current or future ones) are valid approaches, I believe Warehouse/PyPI should decode RFC 2047-encoded author names. In this way, the authors names can be displayed as expected in the Author sidebar field independently, that is, with the characters used in the pyproject.toml file.

Describe the solution you'd like

Instead of =?utf-8?q?Jo=C3=A3o_Palmeiro?=, I would like to see João Palmeiro in the Author sidebar field on a package page regardless of the build backend used (given that this is not an issue when using Hatchling, for example).

So, I propose the following changes (or similar ones) to the format_email filter and its unit test:

+ from email.header import decode_header, make_header


def format_email(metadata_email: str) -> tuple[str, str]:
    """
    Return the name and email address from a metadata RFC-822 string.
+   RFC 2047-encoded names are supported and decoded accordingly.
    Use Jinja's `first` and `last` to access each part in a template.
    TODO: Support more than one email address, per RFC-822.
    """
    emails = []
    for name, email in getaddresses([metadata_email]):
+       name = str(make_header(decode_header(name)))
        if "@" not in email:
            return name, ""
        emails.append((name, email))
    return emails[0][0], emails[0][1]
@pytest.mark.parametrize(
    ("meta_email", "expected_name", "expected_email"),
    [
        ("not-an-email-address", "", ""),
        ("[email protected]", "", "[email protected]"),
        ('"Foo Bar" <[email protected]>', "Foo Bar", "[email protected]"),
+       ('=?utf-8?q?Jo=C3=A3o_Bar?= <[email protected]>', "João Bar", "[email protected]"),
    ],
)
def test_format_email(meta_email, expected_name, expected_email):
    name, email = filters.format_email(meta_email)
    assert name == expected_name
    assert email == expected_email

Let me know what you think and if I can open a PR. Thanks!

Additional context

References:

Related issues/discussions:

@joaopalmeiro joaopalmeiro added feature request requires triaging maintainers need to do initial inspection of issue labels Aug 16, 2024
@di
Copy link
Member

di commented Aug 16, 2024

I think we need the PEP to be clearly updated in order to move forward here, but this makes sense to me!

@di di removed the requires triaging maintainers need to do initial inspection of issue label Aug 16, 2024
@joaopalmeiro
Copy link
Author

Thanks for the feedback, @di!

Btw, do you mean PEP 621 and the authors/maintainers section? Anything I can do to help?

@di
Copy link
Member

di commented Aug 19, 2024

The standard for these fields have been created/updated over several PEPs, see https://packaging.python.org/en/latest/specifications/core-metadata/#history. PEP 621 only concerns itself with the pyproject.toml format and relies on the previously defined PEPs for the requirements for these specific fields.

I'm not actually sure what the right path forward would be here, I think this is probably too small to be it's own PEP, but also the discussion at https://discuss.python.org/t/core-metadata-email-fields-unicode/7421/9 seems to be unresolved as well. Helping come to a resolution in that thread would probably be a good first step.

In the meantime, I'm going to mark this issue as blocked until there's an agreed-upon path forward here!

@di di added the blocked Issues we can't or shouldn't get to yet label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Issues we can't or shouldn't get to yet feature request
Projects
None yet
Development

No branches or pull requests

2 participants