Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Cython language level 3 #3344

Merged
merged 2 commits into from
Dec 3, 2022

Conversation

pabs3
Copy link
Contributor

@pabs3 pabs3 commented May 5, 2022

Python 2 is not supported by gensim, so
switching to language level 3 should be fine.

Silences a warning from Cython:

FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release!

Python 2 is not supported by gensim, so
switching to language level 3 should be fine.

Silences a warning from Cython:

FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release!
@piskvorky
Copy link
Owner

piskvorky commented May 5, 2022

Thanks. Where is that warning shown? If some extensions are being built with an incorrect language level, that sounds dangerous.

CI is failing though.

@piskvorky piskvorky added this to the Next release milestone May 5, 2022
@piskvorky piskvorky requested a review from mpenkov May 5, 2022 05:44
@pabs3
Copy link
Contributor Author

pabs3 commented May 5, 2022

The warning is shown during the build of the Debian package of gensim (amd64 log). Side note: the log also shows various GCC warnings, some test warnings and a warning about setup.py being deprecated within the Python community.

Unfortunately based on the CI results for this PR, it looks like the Cython code in gensim isn't yet compatible with Cython language version 3, so I'm going to mark this PR as draft and work on fixing the issues as I get time.

@pabs3 pabs3 marked this pull request as draft May 5, 2022 06:05
@pabs3
Copy link
Contributor Author

pabs3 commented May 5, 2022

Hmm, maybe I should instead mark those files as language level 2 for now, to silence the warnings.

@piskvorky
Copy link
Owner

language_level is about the level of Python (2 vs 3), right? Not Cython, which works with either.

So we should have language_level=3 everywhere in Gensim. Unless I'm missing something.

@pabs3
Copy link
Contributor Author

pabs3 commented May 5, 2022 via email

@piskvorky
Copy link
Owner

piskvorky commented May 5, 2022

Hmm, the CI errors complain about syntax stuff in the Cython code like gensim/models/fasttext_inner.pyx:187:9: 'REAL_t' is not a type identifier. Which is unrelated to py2/3 IMO, weird. Will need a deeper look.

@piskvorky
Copy link
Owner

piskvorky commented May 5, 2022

The problem seems to stem from this code:

https://github.com/RaRe-Technologies/gensim/blob/5a19dd0407577001bcc177166582da7a5a4e5f70/gensim/models/word2vec_inner.pxd#L125-L126

which seems indeed syntactically broken:

cdef init_w2v_config(Word2VecConfig *c, model, alpha, compute_loss, _work, _neu1=*)
                                                                                 ^
------------------------------------------------------------

gensim/models/word2vec_inner.pxd:126:82: Expected an identifier or literal

The code comes from 3c3506d in #2127.

Or is it some special Cython syntax for PXD? I don't get it. CC @persiyanov @mpenkov – any idea why Cython compilation with language_level=3 fails?

@piskvorky
Copy link
Owner

piskvorky commented May 5, 2022

My hypothesis: error is related to the compilation order of Gensim extensions.

  1. fasttext_inner.pyx depends on word2vec_inner.pyx
  2. But with language_level=3, the fasttext_inner.pyx gets cythonized first => failure.
  3. Under language_level=2, word2vec_inner.pyx is cythonized first and everything works.

Checking my hypothesis now :)

@piskvorky
Copy link
Owner

piskvorky commented May 5, 2022

OK I think I figured it out. The issue is with relative imports (from word2vec_inner import …) in these pyx/pdx files. After changing them to absolute (from gensim.models.word2vec_inner import …) the compilation works even with language_level=3.

@pabs3 I'll push the fix to your branch, OK?

@piskvorky
Copy link
Owner

piskvorky commented May 5, 2022

Hm, I get a permission error when I try to push to your branch directly (usually possible for maintainers across forks but maybe you disabled this).

And github doesn't let me open a PR against your fork (doesn't even list it??). So can you please apply 0e70635 yourself @pabs3? Cheers.

@pabs3
Copy link
Contributor Author

pabs3 commented May 5, 2022 via email

@mpenkov mpenkov marked this pull request as ready for review December 3, 2022 14:11
@mpenkov
Copy link
Collaborator

mpenkov commented Dec 3, 2022

Thank you @pabs3 !

@mpenkov mpenkov merged commit b6ea788 into piskvorky:develop Dec 3, 2022
@pabs3 pabs3 deleted the cleanup-cython-language_level branch December 3, 2022 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants