-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugger causes stack overflow because the property 'sent_start' is infinitely recursive #1640
Comments
@cmckain Do you have any insight about the likely implications of this? I don't regularly use debuggers, and I've never used PyCharm, so I'm not sure whether this points towards a deeper issue. If the crash is exposing a memory error in spaCy (e.g. a use after free or out-of-bounds access), obviously we're very interested in that! But if it's just that we hit some tight stack-size limit in this tool that we don't hit in regular execution, I don't think that's a problem we'd work on. |
I have another problem with non-English models of Spacy 2.0 and PyCharm's debugger. For some days I thought it was a PyCharm problem but after reinstall and even downgrade PyCharm I found that the source of the problem is the loading of the Spacy model. By chance, I left the debugger stopped in that line and discover that it restarts and follow to the next line after 10-15 minutes!!! My Environment
|
Although I haven't stepped through the assembly line-by-line yet, I would infer that the debugger is somehow forcing the program into an unbreakable loop or, perhaps, some recursion which calls the same function over and over again. My first notice of this problem was when a depreciation warning kept printing until the program crashed. Oddly enough, the debugger works the first time you open a Spacy data structure but the second attempt (both on a child structure and something else) causes a crash. Perhaps the debugger is maintaining a lock on some of the data and the program just keeps failing to get it back and, as a result, just crashes? |
I can confirm @ruiEnca's problem. Using the above code, here are my timed results with the debugger on and off: |
I think I figured it out, @honnibal. In the file "spaCy/spacy/tokens/token.pyx", it sets up the 'get' and 'set' functions of the property "sent_start". If the word is literally the beginning of the sentence, it returns false but, if not, it returns the value of "sent_start" which calls the 'get' of "sent_start" which returns the value of "sent_start" and on and on. The stack overflow is because the function never stops calling itself after the first word. For most people, this isn't a problem because they don't need to call a depreciated function but the debugger does (since it lists all possible properties). My previous observation that it only occurred on the second listing was because I always tested the first word first and the second word second; when I removed the logic and had "return self.sent_start" always run, the program failed regardless of what word I chose to debug. My temporary solution was to change line 356 to "return True" although I'm not sure what it normally showed in the past. I would recommend that the removal of that property take place sooner rather than later. For @ruiEnca's issue, I would guess that the debugger is forcing some extra code to run that isn't normally run during startup as the debugger tries to load everything that it can. |
@honnibal @cmckain I found the place where the debugger stops for some minutes while loading a non-English model. It is in the import_file function of compat.py in line 119: My Environment
|
@cmckain Thanks!! Was on holidays for most of December, so just getting back to this now. I've fixed the infinite loop -- I meant to write |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
"Unhandled exception at 0x00007FFC12181517 (token.cp36-win_amd64.pyd) in python.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000000402603FF8)."
Iterating through a sentence causes a crash when PyCharm's debugger attempts to break after the first word (first word->second word->crash). Attached is the memory dump from Python after it crashed. If the dump with the heap would be useful, I can send it but it is over 2 GB.
Sample Code
Your Environment
python2.zip
The text was updated successfully, but these errors were encountered: