Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiLineBlockMixin causes stack overflow from recursive _get_assign_nodes() #786

Open
DavidCain opened this issue May 8, 2020 · 9 comments
Labels
Enhancement ✨ Improvement to a component topic-performance

Comments

@DavidCain
Copy link
Contributor

DavidCain commented May 8, 2020

Problem

The recursive nature of the _get_assign_nodes() method can lead to a RecursionError and break running pylint on a project that has a class with a large control structure in its method.

Workaround

A hackish means of avoiding this bug is to increase the stack size to some value that enables further recursion without exhausting the stack. It's documented in a comment, but in essence the idea is calling sys.setrecursionlimit(some_int) before running pylint or astroid). A more robust fix would be to avoid recursion altogether since this hack only increases the number of lines astroid can parse.

Affected versions

>>> from astroid import __pkginfo__; print(__pkginfo__.version)
2.4.1

I've tested this on every minor version since Astroid 2.1.0. Astroid 2.0.4 is not affected.

Since the release of Astroid 2.1.0, we've been experiencing stack overflow exceptions for a class with a very large chain of if/else statements.

How to reproduce:

The following Python file is enough to trigger a recursion error on any version of Astroid from 2.1.0 through the unreleased 2.5.0:

# file_with_large_if_elif.py
class TestClasse:
    def cause_astroid_recursion(self):
        if True:
            pass
        elif True:
            pass
        # Repeat as needed to overflow > sys.getrecursionlimit() on your system
        #
        # On my machine, 119 total `elif` statements is enough to trigger recursion
        elif True:
            pass
$ pylint file_with_large_if_elif.py
<full stacktrace truncated>
  File ".../astroid/mixins.py", line 153, in _get_assign_nodes
    return list(itertools.chain.from_iterable(children_assign_nodes))
  File ".../astroid/mixins.py", line 151, in <genexpr>
    for child_node in block
  File ".../astroid/decorators.py", line 33, in cached
    cache[func] = result = func(*args, **kwargs)
  File ".../astroid/mixins.py", line 153, in _get_assign_nodes
    return list(itertools.chain.from_iterable(children_assign_nodes))
  File ".../astroid/mixins.py", line 151, in <genexpr>
    for child_node in block
  File ".../astroid/decorators.py", line 33, in cached
    cache[func] = result = func(*args, **kwargs)
RecursionError: maximum recursion depth exceeded while calling a Python object

Source of the RecursionError

The current implementation of MultiLineBlockMixin makes a recursive call to _get_assign_nodes():

@decorators.cached
def _get_assign_nodes(self):
    children_assign_nodes = (
        child_node._get_assign_nodes()
        for block in self._multi_line_blocks
        for child_node in block
    )
    return list(itertools.chain.from_iterable(children_assign_nodes))

Possible fix

I think that a possible fix would be to switch from a recursive design, towards a stack-based solution. If we instead built a list of nodes as we go (say, with a while loop & a stack), we could avoid the overflow.

The generators used in _get_assign_nodes() are eagerly converted into a list anyway, so keeping a list of children then popping to get all assign nodes should have no negative performance effects.

A quick attempt on my own machine seems to work (though I've not written any new tests, or tested my changes against the test suite).

A less contrived reproduction

I stumbled upon this error by trying to run astroid on a class that's automatically produced by Apache Thrift.

Given an input file like so:

// proof_of_concept.thrift
struct LargeStruct {
    1: optional string argument_1;
    2: optional string argument_2;
    3: optional string argument_3;
    // ...
    116: optional string argument_116;
}

One can run thrift --gen py proof_of_concept.thrift to produce gen-py/proof_of_concept/ttypes.py

In the produced file is a large Python class - the many if/elif statements in read() method is what trips up astroid:

class LargeStruct(object):
    # <other methods truncated>
    def read(self, iprot):
        # (some code truncated) - the body of the `while` loop is what's relevant
        while True:
            (fname, ftype, fid) = iprot.readFieldBegin()
            if ftype == TType.STOP:
                break
            if fid == 1:
                if ftype == TType.STRING:
                    self.argument_1 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
                else:
                    iprot.skip(ftype)
            elif fid == 2:
                if ftype == TType.STRING:
                    self.argument_2 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
                else:
                    iprot.skip(ftype)
            #
            # <113 other `elif` branches truncated>
            #
            elif fid == 116:
                if ftype == TType.STRING:
                    self.argument_116 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
                else:
                    iprot.skip(ftype)
            else:
                iprot.skip(ftype)
            iprot.readFieldEnd()
        iprot.readStructEnd()

Large Thrift structs are fairly common, so this can happen with some frequency.

Pull request?

I understand if parsing hundreds of control statements isn't something that astroid seeks to be able to do.

However, if you're open to refactoring out recursion, I'd be willing to write up a PR for this problem.

@PCManticore
Copy link
Contributor

Hey @DavidCain That's a great analysis, thank you. Moving to a stack based solution definitely works for us, so would be great if you can write up a PR for this!

@PCManticore PCManticore added Enhancement ✨ Improvement to a component topic-performance labels May 12, 2020
@DavidCain
Copy link
Contributor Author

Great! I'll whip up a PR then and reference this issue.

Thanks for reviewing.

It will probably take me until this weekend to find the time.

@DavidCain
Copy link
Contributor Author

DavidCain commented Jun 5, 2020

Sorry, haven't forgotten about this issue just yet, but it clearly took me more time than just a week to find the time. =) I'll hopefully have a PR out soon-ish.

@alexeiz
Copy link

alexeiz commented Jun 15, 2020

I get this RecursionError very consistently with pylint==2.5.2 and astroid==2.4.1. I'd appreciate any help resolving it because it makes pylint pretty much useless.

  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/scoped_nodes.py", line 642, in import_module
    return MANAGER.ast_from_module_name(absmodname)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/manager.py", line 189, in ast_from_module_name
    return self.ast_from_file(found_spec.location, modname, fallback=False)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/manager.py", line 98, in ast_from_file
    return AstroidBuilder(self).file_build(filepath, modname)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/builder.py", line 138, in file_build
    return self._post_build(module, encoding)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/builder.py", line 158, in _post_build
    self.delayed_assattr(delayed)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/builder.py", line 226, in delayed_assattr
    for inferred in node.expr.infer():
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 132, in raise_if_nothing_inferred
    yield next(generator)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 96, in wrapped
    res = next(generator)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/bases.py", line 136, in _infer_stmts
    for inferred in stmt.infer(context=context):
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/util.py", line 160, in limit_inference
    yield from islice(iterator, size)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/context.py", line 113, in cache_generator
    for result in generator:
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 132, in raise_if_nothing_inferred
    yield next(generator)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 93, in wrapped
    generator = _func(node, context, **kwargs)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/inference.py", line 850, in infer_assign
    stmts = list(self.assigned_stmts(context=context))
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/protocols.py", line 323, in _arguments_infer_argname
    functype = self.parent.type
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 72, in __get__
    val = self.wrapped(inst)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/scoped_nodes.py", line 1461, in type
    for decorator in self.extra_decorators:
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 72, in __get__
    val = self.wrapped(inst)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/scoped_nodes.py", line 1424, in extra_decorators
    for assign in frame._get_assign_nodes():
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 34, in cached
    cache[func] = result = func(*args, **kwargs)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/scoped_nodes.py", line 2927, in _get_assign_nodes
    return list(itertools.chain.from_iterable(children_assign_nodes))
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/scoped_nodes.py", line 2925, in <genexpr>
    child_node._get_assign_nodes() for child_node in self.body
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 34, in cached
    cache[func] = result = func(*args, **kwargs)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/mixins.py", line 153, in _get_assign_nodes
    return list(itertools.chain.from_iterable(children_assign_nodes))
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/mixins.py", line 151, in <genexpr>
    for child_node in block
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 34, in cached
    cache[func] = result = func(*args, **kwargs)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/mixins.py", line 153, in _get_assign_nodes
... 900 more lines ...
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/mixins.py", line 153, in _get_assign_nodes
    return list(itertools.chain.from_iterable(children_assign_nodes))
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/mixins.py", line 151, in <genexpr>
    for child_node in block
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/decorators.py", line 34, in cached
    cache[func] = result = func(*args, **kwargs)
  File "/home/me/.local/share/virtualenvs/src-2P3pj8Xl/lib/python3.7/site-packages/astroid/node_classes.py", line 1953, in _get_assign_nodes
    return [self] + list(self.value._get_assign_nodes())
RecursionError: maximum recursion depth exceeded while calling a Python object

@brycepg
Copy link
Contributor

brycepg commented Jun 19, 2020

Hi I'm having a hard time reproducing this error with thrift @DavidCain

Here are the commands I used:

Download your thrift code as poc.thrift

sudo dnf install thrift
thrift --gen py poc.thrift
pip install pylint==2.5.2 astroid==2.4.1
pip install thrift
pylint gen-py/
(repro-786) user@self astroid$ dnf list thrift
Last metadata expiration check: 0:10:15 ago on Thu Jun 18 22:58:03 2020.
Installed Packages
thrift.x86_64                                                       0.10.0-15.fc29                                                        @fedora
Available Packages
thrift.i686                                                         0.10.0-15.fc29                                                        fedora 
pylint 2.5.2
astroid 2.4.1
Python 3.7.7 (default, Mar 13 2020, 21:39:43) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]
(repro-786) user@self astroid$ pip freeze
astroid==2.4.1
isort==4.3.21
lazy-object-proxy==1.4.3
mccabe==0.6.1
pylint==2.5.2
six==1.15.0
thrift==0.13.0
toml==0.10.1
typed-ast==1.4.1
wrapt==1.12.1

What versions of thrift are you using?

@DavidCain
Copy link
Contributor Author

Hi, Bryce. Pretty much any version of thrift should do, though I'm reproducing again this morning with 0.13.0. I'll also use 3.7.7 and the pylint + astroid versions you used.

(Note that my poc.thrift is an abridged version - you truly do need 116 separate options to generate Python code that triggers this). Let's do 200 for good measure, though, and automatically generate the full file.

$ echo 'struct LargeStruct {' > poc.thrift
$ for i in {1..200}; do; echo "  $i: optional string arg_$i;"; done >> poc.thrift
$ echo '}' >> poc.thrift
$ md5 poc.thrift
MD5 (poc.thrift) = 92d97c7a0e9d57c37d95e8adfe93fbfc
$ thrift --gen py poc.thrift
$ pylint gen-py
...
  File "/Users/david/.pyenv/versions/3.7.7/envs/venv-poc/lib/python3.7/site-packages/astroid/mixins.py", line 130, in <genexpr>
    return tuple(getattr(self, field) for field in self._multi_line_block_fields)
RecursionError: maximum recursion depth exceeded while calling a Python object
$ thrift --version
Thrift version 0.13.0
$ pip freeze
astroid==2.4.1
isort==4.3.21
lazy-object-proxy==1.4.3
mccabe==0.6.1
pylint==2.5.2
six==1.15.0
toml==0.10.1
typed-ast==1.4.1
wrapt==1.12.1
$ python --version
Python 3.7.7

@jbwdevries
Copy link

jbwdevries commented Oct 26, 2020

We're seeing a similar issue using method chaining. I have an example here: https://pastebin.com/uTyDLrDY

Removing one .method() call prevents the recursion error. So does sys.setrecursionlimit(1001).

(Obviously in our production case each method is different and does more than just increase a counter)

@DavidCain
Copy link
Contributor Author

Argh, sorry for leaving this issue unresolved for so long. I won't make any future promises about when I might tackle a PR, but maybe one day soon.

In the meantime, @jbwdevries - I can at least document the workaround we're using!

Temporary workaround: increase stack size

We've implemented a custom pylint checker. Because this checker is imported before astroid parses source code, it can modify the size of Python's stack, preventing overflow.

# checkers/__init__.py - in a custom checker package 
import sys

def register(linter):
    # HACK: Increase recursion limit to prevent stack overflow
    # See: https://github.com/PyCQA/astroid/issues/786
    sys.setrecursionlimit(3000)  # (Larger than `sys.getrecursionlimit()`)

    # (registration of custom checkers is omitted for brevity)

Then in the project's .pylintrc:

[MASTER]

load-plugins=
  # Load our custom checkers, which will increase the stack size only on pylint runs
  custom_checker_pkg.checkers,

(custom_checker_pkg must be available for import at the time of running pylint).

3000 is an arbitrary limit that should be tuned to your own needs. Too large of a stack will obviously create problems, and this doesn't solve the actual problem but merely increases the number of times that astroid can recurse).

Hopefully this is helpful.

@hippo91
Copy link
Contributor

hippo91 commented Oct 29, 2020

@DavidCain thanks for this little hack! And do not worry about your promise! We are all volunteers here and we do as much as we can within our free time! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component topic-performance
Projects
None yet
Development

No branches or pull requests

6 participants