Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pandas testsuite on 2.2.x with numpy 2 fails on numexpr #58548

Open
3 tasks done
bnavigator opened this issue May 3, 2024 · 5 comments
Open
3 tasks done

BUG: pandas testsuite on 2.2.x with numpy 2 fails on numexpr #58548

bnavigator opened this issue May 3, 2024 · 5 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Milestone

Comments

@bnavigator
Copy link
Contributor

bnavigator commented May 3, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# pip install 'numpy>=2.0.0rc1' 'pandas==2.2.2' hypothesis pytest numexpr
import pandas
pandas.test(extra_args=["-k", "test_eval and numexpr", "-l", "--tb=long"])

Issue Description

I'm currently testing numpy 2 on the the openSUSE python ecosystem. I notice the pandas test suite failing when numpy 2.0.0rc1 is installed instead of 1.26.4.

See also pydata/numexpr#483

============================================================================================= FAILURES =============================================================================================
___________________________________________________________ TestTypeCasting.test_binop_typecasting[numexpr-python-left_right0-float64-/] ___________________________________________________________

self = <pandas.tests.computation.test_eval.TestTypeCasting object at 0x7fdda001f250>, engine = 'numexpr', parser = 'python', op = '/', dt = <class 'numpy.float64'>, left_right = ('df', '3')

    @pytest.mark.parametrize("op", ["+", "-", "*", "**", "/"])
    # maybe someday... numexpr has too many upcasting rules now
    # chain(*(np.core.sctypes[x] for x in ['uint', 'int', 'float']))
    @pytest.mark.parametrize("dt", [np.float32, np.float64])
    @pytest.mark.parametrize("left_right", [("df", "3"), ("3", "df")])
    def test_binop_typecasting(self, engine, parser, op, dt, left_right):
        df = DataFrame(np.random.default_rng(2).standard_normal((5, 3)), dtype=dt)
        left, right = left_right
        s = f"{left} {op} {right}"
>       res = pd.eval(s, engine=engine, parser=parser)

df         =           0         1         2
0  0.189053 -0.522748 -0.413064
1 -2.441467  1.799707  1.144166
2 -0.325423  0.773807  0.281211
3 -0.553823  0.977567 -0.310557
4 -0.328824 -0.792147  0.454958
dt         = <class 'numpy.float64'>
engine     = 'numexpr'
left       = 'df'
left_right = ('df', '3')
op         = '/'
parser     = 'python'
right      = '3'
s          = 'df / 3'
self       = <pandas.tests.computation.test_eval.TestTypeCasting object at 0x7fdda001f250>

pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py:756:

...

>           ret = eng_inst.evaluate()

eng        = <class 'pandas.core.computation.engines.NumExprEngine'>
eng_inst   = <pandas.core.computation.engines.NumExprEngine object at 0x7fdd41459490>
engine     = 'numexpr'
env        = Scope(scope=['Timestamp',
 'datetime',
 'True',
 'False',
 'list',
 'tuple',
 'inf',
 'Inf',
 '__name__',
 '__doc__',
...t_set_inplace',
 'TestValidate',
 'self',
 'op',
 'dt',
 'left_right',
 'df',
 'left',
 'right',
 's']
, resolvers=[]
)
expr       = 'df / 3'
exprs      = ['df / 3']
first_expr = True
global_dict = None
inplace    = False
level      = 0
local_dict = None
multi_line = False
parsed_expr = (df) / (np.float64(3.0))
parser     = 'python'
resolvers  = ()
ret        = None
target     = None
target_modified = False

pdtestenv/lib64/python3.11/site-packages/pandas/core/computation/eval.py:357:

...

==================================================================================== short test summary info ======================================================================================
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-python-left_right0-float64-/] - ValueError: Expression (df) / (np.float64(3.0)) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-python-left_right1-float64-/] - ValueError: Expression (np.float64(3.0)) / (df) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-pandas-left_right0-float64-/] - ValueError: Expression (df) / (np.float64(3.0)) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-pandas-left_right1-float64-/] - ValueError: Expression (np.float64(3.0)) / (df) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestOperations::test_simple_arith_ops[numexpr-python] - ValueError: Expression (np.float64(1.0)) / (np.float64(1.0)) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestOperations::test_simple_arith_ops[numexpr-pandas] - ValueError: Expression (np.float64(1.0)) / (np.float64(1.0)) has forbidden control characters.
====================================================== 6 failed, 5432 passed, 62 skipped, 199038 deselected, 2 xpassed, 4 warnings in 32.69s =======================================================

Expected Behavior

Passing test suite, proper string evaluation

Installed Versions

INSTALLED VERSIONS
------------------
commit                : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python                : 3.11.9.final.0
python-bits           : 64
OS                    : Linux
OS-release            : 6.8.7-1-default
Version               : #1 SMP PREEMPT_DYNAMIC Thu Apr 18 07:12:38 UTC 2024 (5c0cf23)
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : de_DE.UTF-8
LOCALE                : de_DE.UTF-8

pandas                : 2.2.2
numpy                 : 2.0.0rc1
pytz                  : 2024.1
dateutil              : 2.9.0.post0
setuptools            : 65.5.0
pip                   : 24.0
Cython                : None
pytest                : 8.2.0
hypothesis            : 6.100.2
sphinx                : None
blosc                 : None
feather               : None
xlsxwriter            : None
lxml.etree            : None
html5lib              : None
pymysql               : None
psycopg2              : None
jinja2                : None
IPython               : None
pandas_datareader     : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : None
gcsfs                 : None
matplotlib            : None
numba                 : None
numexpr               : 2.10.0
odfpy                 : None
openpyxl              : None
pandas_gbq            : None
pyarrow               : None
pyreadstat            : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
zstandard             : None
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None
@bnavigator bnavigator added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 3, 2024
@bnavigator
Copy link
Contributor Author

Same for the development version:

pip install 'numpy>=2.0.0rc1' hypothesis pytest numexpr
pip install --pre --extra-index https://pypi.anaconda.org/scientific-python-nightly-wheels/simple pandas
python  -c 'import pandas; pandas.test(extra_args=["-k", "test_eval and numexpr", "-l", "--tb=long"])'
running: pytest -k test_eval and numexpr -l --tb=long /var/tmp/pdtestmain/pdtestenv/lib64/python3.11/site-packages/pandas
=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.11.9, pytest-8.2.0, pluggy-1.5.0
rootdir: /var/tmp/pdtestmain/pdtestenv/lib64/python3.11/site-packages/pandas
configfile: pyproject.toml
plugins: hypothesis-6.100.2
collected 204134 items / 198621 deselected / 62 skipped / 5513 selected
...
============================================================================================= short test summary info ==============================================================================================
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-python-float-left_right0-/] - ValueError: Expression (df) / (np.float64(3.0)) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-python-float-left_right1-/] - ValueError: Expression (np.float64(3.0)) / (df) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-python-float64-left_right0-/] - ValueError: Expression (df) / (np.float64(3.0)) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-python-float64-left_right1-/] - ValueError: Expression (np.float64(3.0)) / (df) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-pandas-float-left_right0-/] - ValueError: Expression (df) / (np.float64(3.0)) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-pandas-float-left_right1-/] - ValueError: Expression (np.float64(3.0)) / (df) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-pandas-float64-left_right0-/] - ValueError: Expression (df) / (np.float64(3.0)) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestTypeCasting::test_binop_typecasting[numexpr-pandas-float64-left_right1-/] - ValueError: Expression (np.float64(3.0)) / (df) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestOperations::test_simple_arith_ops[numexpr-python] - ValueError: Expression (np.float64(1.0)) / (np.float64(1.0)) has forbidden control characters.
FAILED pdtestenv/lib64/python3.11/site-packages/pandas/tests/computation/test_eval.py::TestOperations::test_simple_arith_ops[numexpr-pandas] - ValueError: Expression (np.float64(1.0)) / (np.float64(1.0)) has forbidden control characters.
============================================================== 10 failed, 5501 passed, 62 skipped, 198621 deselected, 2 xpassed, 4 warnings in 45.77s ==============================================================

INSTALLED VERSIONS
------------------
commit                : b4d3309e05e0afb7ee5bd671c2150d1e6eebbb88
python                : 3.11.9.final.0
python-bits           : 64
OS                    : Linux
OS-release            : 6.8.7-1-default
Version               : #1 SMP PREEMPT_DYNAMIC Thu Apr 18 07:12:38 UTC 2024 (5c0cf23)
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : de_DE.UTF-8
LOCALE                : de_DE.UTF-8

pandas                : 3.0.0.dev0+861.gb4d3309e05
numpy                 : 2.0.0rc1
pytz                  : 2024.1
dateutil              : 2.9.0.post0
setuptools            : 65.5.0
pip                   : 24.0
Cython                : None
pytest                : 8.2.0
hypothesis            : 6.100.2
sphinx                : None
blosc                 : None
feather               : None
xlsxwriter            : None
lxml.etree            : None
html5lib              : None
pymysql               : None
psycopg2              : None
jinja2                : None
IPython               : None
pandas_datareader     : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
fastparquet           : None
fsspec                : None
gcsfs                 : None
matplotlib            : None
numba                 : None
numexpr               : 2.10.0
odfpy                 : None
openpyxl              : None
pyarrow               : None
pyreadstat            : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
zstandard             : None
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None

@bnavigator
Copy link
Contributor Author

Still an issue with pandas-3.0.0.dev0+991.ga3e751c6b4.

The reason is the explicit type repr for all scalar types in numpy 2:
https://numpy.org/devdocs/release/2.0.0-notes.html#representation-of-numpy-scalars-changed

s = '(df) / (np.float64(3.0))', types = {}, context = {'optimization': 'aggressive', 'truediv': False}, sanitize = True

    def stringToExpression(s, types, context, sanitize: bool=True):
        """Given a string, convert it to a tree of ExpressionNode's.
        """
        # sanitize the string for obvious attack vectors that NumExpr cannot
        # parse into its homebrew AST. This is to protect the call to `eval` below.
        # We forbid `;`, `:`. `[` and `__`, and attribute access via '.'.
        # We cannot ban `.real` or `.imag` however...
        # We also cannot ban `.\d*j`, where `\d*` is some digits (or none), e.g. 1.5j, 1.j
        if sanitize:
            no_whitespace = re.sub(r'\s+', '', s)
            skip_quotes = re.sub(r'(\'[^\']*\')', '', no_whitespace)
            if _blacklist_re.search(skip_quotes) is not None:
>               raise ValueError(f'Expression {s} has forbidden control characters.')
E               ValueError: Expression (df) / (np.float64(3.0)) has forbidden control characters.

context    = {'optimization': 'aggressive', 'truediv': False}
no_whitespace = '(df)/(np.float64(3.0))'
s          = '(df) / (np.float64(3.0))'
sanitize   = True
skip_quotes = '(df)/(np.float64(3.0))'
types      = {}

pandas_numexpr_numpy2/lib64/python3.11/site-packages/numexpr/necompiler.py:283: ValueError

@bnavigator
Copy link
Contributor Author

numexpr does not seem to be installed in the Numpy dev CI runs, so these tests are skipped there.

@bnavigator
Copy link
Contributor Author

No longer an issue with pandas-3.0.0.dev0+1178.g236d89b856

Now how to find the correct change to backport to 2.x ...

@bnavigator bnavigator changed the title BUG: pandas testsuite with numpy 2.0.0rc1 fails on numexpr BUG: pandas testsuite on 2.2.x with numpy 2 fails on numexpr Jul 12, 2024
@hawkinsp
Copy link
Contributor

hawkinsp commented Aug 7, 2024

I don't know this code at all, but I think #59144 was the relevant change? Can we try backporting that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

4 participants