Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize with CPython #56

Closed
SnoopJ opened this issue Sep 17, 2022 · 8 comments
Closed

Synchronize with CPython #56

SnoopJ opened this issue Sep 17, 2022 · 8 comments

Comments

@SnoopJ
Copy link
Collaborator

SnoopJ commented Sep 17, 2022

As mentioned in #55, unicodedata.c and other files extracted from CPython are pretty out of sync with the upstream. Filing this issue so that there's a coherent place for questions/discussion that might crop up in pursuit of that.

See also #39 which covers some of the same ground, but is itself out of date.

@SnoopJ SnoopJ changed the title Update unicodedata.c Synchronize with CPython Sep 17, 2022
@SnoopJ
Copy link
Collaborator Author

SnoopJ commented Sep 17, 2022

I've gotten a start on this, but as I've been working on it, I realize that unicodedata2 does not explicitly declare which Python versions are supported. I'd assume it's the current versions of Python in general, but it would be helpful to have such a declaration.

The thing that got me looking for supported-versions info is the introduction of the _PyCFunction_CAST macro, which appears only on the 3.11 branch and seems safe to leave off for the sake of 3.7, 3.8, 3.9, 3.10 from this issue's changes (or maybe it's worth porting that definition into unicodedata2 to smooth things out for future updates). Not sure if there are going to be other version specificity issues, will update here if there.

Edit: Unless I hear otherwise, I'm going to proceed assuming the target Python is >3.7, but will continue to update if I learn things about version specificity as I go.

@SnoopJ
Copy link
Collaborator Author

SnoopJ commented Sep 18, 2022

More information about version sensitivity: the code generated by newer versions of Argument Clinic (clinic, responsible for much of the difference churn) uses some symbols that aren't defined before 3.8 (and Py_ssize_clean_t for which PyPy needs a typedef) , but nothing too tricky to handle.

It seems worth defining a header to hold compatibility code of this sort to streamline comparison against the upstream (i.e. only the #include will show up in a diff against CPython), I've done this on my working branch and things look good, but if this is too opinionated of a change, let me know and I can revise my approach.

For now, I'm using envlist = py{36,37,38,39,310}, pypy{37,38,39} in testing my changes, but won't fuss over it if something breaks for 3.6.

@SnoopJ
Copy link
Collaborator Author

SnoopJ commented Sep 19, 2022

Looks like PyPy is missing some of the symbols necessary for the is_normalized() function (added in CPython 3.8), and I've had a little trouble trying to add those to the compatibility shims without making a mess. Seems like PyPy's implementation is done in Python and takes the slow path of calling normalize() on the input and then comparing the result to the input.

Not sure what the best option is in this case, possibly putting the CPython implementation behind an #ifndef PYPY_VERSION and falling back on behavior equivalent to PyPy's normalize-and-test strategy for that runtime. Alternatively, this feature could be left off this synchronization issue and filed separately.

@anthrotype
Copy link
Member

assuming the target Python is >3.7

yes, that's correct.

defining a header to hold compatibility code

yeah, that's a good idea, please do that.

putting the CPython implementation behind an #ifndef PYPY_VERSION and falling back...

sounds good.

Thanks a lot for working on this!!

@SnoopJ
Copy link
Collaborator Author

SnoopJ commented Sep 23, 2022

defining a header to hold compatibility code

yeah, that's a good idea, please do that.

putting the CPython implementation behind an #ifndef PYPY_VERSION and falling back...

sounds good.

I mentioned this on the PR (#58) I just opened, but it seems that the Argument Clinic changes are a little too much to handle this way (or maybe I didn't go through enough rounds of "find that missing symbol and add it").

There shouldn't be any trouble sticking with the old clinic style for the time being, but it's something to keep an eye out for as 3.7 and 3.8 reach their EOL.

@anthrotype
Copy link
Member

@SnoopJ can this be closed?

@SnoopJ
Copy link
Collaborator Author

SnoopJ commented Oct 28, 2022

There's going to be some more stuff to do when 3.7 slips into obsolescence, but yep this can be closed now

@SnoopJ SnoopJ closed this as completed Oct 28, 2022
@SnoopJ
Copy link
Collaborator Author

SnoopJ commented May 20, 2023

I recently became aware of the pythoncapi-compat project, which might be helpful for future work on the C API usage of unicodedata2 in place of the _unicodedata2_compat.h shim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants