Skip to content

Commit

Permalink
Make more difficult sanitize of the expression string before eval
Browse files Browse the repository at this point in the history
  • Loading branch information
robbmcleod committed Aug 6, 2023
1 parent 4b2d89c commit 00b035c
Show file tree
Hide file tree
Showing 5 changed files with 91 additions and 23 deletions.
23 changes: 21 additions & 2 deletions ANNOUNCE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@ Announcing NumExpr 2.8.5

Hi everyone,

**Under development.**
In 2.8.5 we have added a new function, `validate` which checks an expression `ex`
for validity, for usage where the program is parsing a user input. There are also
consequences for this sort of usage, since `eval(ex)` is called, and as such we
do some string sanitization as described below.

Project documentation is available at:

Expand All @@ -13,7 +16,23 @@ http://numexpr.readthedocs.io/
Changes from 2.8.4 to 2.8.5
---------------------------

**Under development.**
* A `validate` function has been added. This function checks the inputs, returning
`None` on success or raising an exception on invalid inputs. This function was
added as numerous projects seem to be using NumExpr for parsing user inputs.
`re_evaluate` may be called directly following `validate`.
* As an addendum to the use of NumExpr for parsing user inputs, is that NumExpr
calls `eval` on the inputs. A regular expression is now applied to help sanitize
the input expression string, forbidding '__', ':', and ';'. Attribute access
is also banned except for '.r' for real and '.i' for imag.
* Thanks to timbrist for a fix to behavior of NumExpr with integers to negative
powers. NumExpr was pre-checking integer powers for negative values, which
was both inefficient and causing parsing errors in some situations. Now NumExpr
will simply return 0 as a result for such cases. While NumExpr generally tries
to follow NumPy behavior, performance is also critical.
* Thanks to peadar for some fixes to how NumExpr launches threads for embedded
applications.
* Thanks to de11n for making parsing of the `site.cfg` for MKL consistent among
all shared platforms.


What's Numexpr?
Expand Down
19 changes: 18 additions & 1 deletion RELEASE_NOTES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,24 @@ Release notes for NumExpr 2.8 series
Changes from 2.8.4 to 2.8.5
---------------------------

**Under development.**
* A `validate` function has been added. This function checks the inputs, returning
`None` on success or raising an exception on invalid inputs. This function was
added as numerous projects seem to be using NumExpr for parsing user inputs.
`re_evaluate` may be called directly following `validate`.
* As an addendum to the use of NumExpr for parsing user inputs, is that NumExpr
calls `eval` on the inputs. A regular expression is now applied to help sanitize
the input expression string, forbidding '__', ':', and ';'. Attribute access
is also banned except for '.r' for real and '.i' for imag.
* Thanks to timbrist for a fix to behavior of NumExpr with integers to negative
powers. NumExpr was pre-checking integer powers for negative values, which
was both inefficient and causing parsing errors in some situations. Now NumExpr
will simply return 0 as a result for such cases. While NumExpr generally tries
to follow NumPy behavior, performance is also critical.
* Thanks to peadar for some fixes to how NumExpr launches threads for embedded
applications.
* Thanks to de11n for making parsing of the `site.cfg` for MKL consistent among
all shared platforms.


Changes from 2.8.3 to 2.8.4
---------------------------
Expand Down
27 changes: 17 additions & 10 deletions doc/user_guide.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
NumExpr 2.0 User Guide
NumExpr 2.8 User Guide
======================

The :code:`numexpr` package supplies routines for the fast evaluation of
The NumExpr package supplies routines for the fast evaluation of
array expressions elementwise by using a vector-based virtual
machine.

Expand All @@ -11,23 +11,33 @@ Using it is simple::
>>> import numexpr as ne
>>> a = np.arange(10)
>>> b = np.arange(0, 20, 2)
>>> c = ne.evaluate("2*a+3*b")
>>> c = ne.evaluate('2*a + 3*b')
>>> c
array([ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72])


It is also possible to use NumExpr to validate an expression::

>>> ne.validate('2*a + 3*b')

which returns `None` on success or raises an exception on invalid inputs.

and it can also re_evaluate an expression::

>>> b = np.arange(0, 40, 4)
>>> ne.re_evaluate()

Building
--------

*NumExpr* requires Python_ 2.6 or greater, and NumPy_ 1.7 or greater. It is
*NumExpr* requires Python_ 3.7 or greater, and NumPy_ 1.13 or greater. It is
built in the standard Python way:

.. code-block:: bash
$ python setup.py build
$ python setup.py install
$ pip install .
You must have a C-compiler (i.e. MSVC on Windows and GCC on Linux) installed.
You must have a C-compiler (i.e. MSVC Build tools on Windows and GCC on Linux) installed.

Then change to a directory that is not the repository directory (e.g. `/tmp`) and
test :code:`numexpr` with:
Expand Down Expand Up @@ -268,9 +278,6 @@ General routines
* :code:`detect_number_of_cores()`: Detects the number of cores on a system.





Intel's VML specific support routines
-------------------------------------

Expand Down
27 changes: 19 additions & 8 deletions numexpr/necompiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,15 +260,17 @@ def __init__(self, astnode):
def __str__(self):
return 'Immediate(%d)' % (self.node.value,)

_forbidden_re = re.compile('[\;[\:]|__')

_forbidden_re = re.compile('[\;[\:]|__|\.[abcdefghjklmnopqstuvwxyzA-Z_]')
def stringToExpression(s, types, context):
"""Given a string, convert it to a tree of ExpressionNode's.
"""
# sanitize the string for obvious attack vectors that NumExpr cannot
# parse into its homebrew AST. This is to protect the call to `eval` below.
# We forbid `;`, `:`. `[` and `__`
# We would like to forbid `.` but it is both a reference and decimal point.
if _forbidden_re.search(s) is not None:
# We forbid `;`, `:`. `[` and `__`, and attribute access via '.'.
# We cannot ban `.real` or `.imag` however...
no_whitespace = re.sub(r'\s+', '', s)
if _forbidden_re.search(no_whitespace) is not None:
raise ValueError(f'Expression {s} has forbidden control characters.')

old_ctx = expressions._context.get_current_context()
Expand Down Expand Up @@ -766,7 +768,6 @@ def getArguments(names, local_dict=None, global_dict=None, _frame_depth: int=2):
_names_cache = CacheDict(256)
_numexpr_cache = CacheDict(256)
_numexpr_last = {}
_numexpr_sanity = set()
evaluate_lock = threading.Lock()

# MAYBE: decorate this function to add attributes instead of having the
Expand Down Expand Up @@ -828,6 +829,13 @@ def validate(ex: str,
_frame_depth: int
The calling frame depth. Unless you are a NumExpr developer you should
not set this value.
Note
----
Both `validate` and by extension `evaluate` call `eval(ex)`, which is
potentially dangerous on unsanitized inputs. As such, NumExpr does some
sanitization, banning the character ':;[', the dunder '__', and attribute
access to all but '.r' for real and '.i' for imag access to complex numbers.
"""
global _numexpr_last

Expand Down Expand Up @@ -857,8 +865,6 @@ def validate(ex: str,
kwargs = {'out': out, 'order': order, 'casting': casting,
'ex_uses_vml': ex_uses_vml}
_numexpr_last = dict(ex=compiled_ex, argnames=names, kwargs=kwargs)
# with evaluate_lock:
# return compiled_ex(*arguments, **kwargs)
except Exception as e:
return e
return None
Expand Down Expand Up @@ -918,7 +924,12 @@ def evaluate(ex: str,
The calling frame depth. Unless you are a NumExpr developer you should
not set this value.
Note
----
Both `validate` and by extension `evaluate` call `eval(ex)`, which is
potentially dangerous on unsanitized inputs. As such, NumExpr does some
sanitization, banning the character ':;[', the dunder '__', and attribute
access to all but '.r' for real and '.i' for imag access to complex numbers.
"""
# We could avoid code duplication if we called validate and then re_evaluate
# here, but they we have difficulties with the `sys.getframe(2)` call in
Expand Down
18 changes: 16 additions & 2 deletions numexpr/tests/test_numexpr.py
Original file line number Diff line number Diff line change
Expand Up @@ -536,13 +536,27 @@ def test_forbidden_tokens(self):

# Forbid semicolon
try:
evaluate('import os; os.cpu_count()')
evaluate('import os;')
except ValueError:
pass
else:
self.fail()

# I struggle to come up with cases for our ban on `'` and `"`
# Attribute access
try:
evaluate('os.cpucount()')
except ValueError:
pass
else:
self.fail()

# But decimal point must pass
a = 3.0
evaluate('a*2.')
evaluate('2.+a')






Expand Down

0 comments on commit 00b035c

Please sign in to comment.