You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following code block induces a hung state in the NLP module (this is text from a real-world corpus, similar errors happen on long Java namespaces (e.g. org.apache.x.y.z).
Our inspection suggests the tokenizer is thrashing, perhaps owing to a regex of exploding complexity. We've routed around the damage with a context manager that uses signal.SIGALRM to timeout if spacy takes too long, but this issue was the source of much confusion as regards seemingly simple jobs that were running for extended periods of time.
The text was updated successfully, but these errors were encountered:
Your analysis was correct -- and the problem was indeed in the URL matching expression introduced in pull request #879, to address Issue #840.
The fix turns out to be very simple, and a bit interesting: all I had to do was switch to @mrabarnett 's regex module -- so I guess this is an interesting example of a pathological back-tracking behaviour "in the wild".
The following code block induces a hung state in the NLP module (this is text from a real-world corpus, similar errors happen on long Java namespaces (e.g. org.apache.x.y.z).
Our inspection suggests the tokenizer is thrashing, perhaps owing to a regex of exploding complexity. We've routed around the damage with a context manager that uses
signal.SIGALRM
to timeout ifspacy
takes too long, but this issue was the source of much confusion as regards seemingly simple jobs that were running for extended periods of time.The text was updated successfully, but these errors were encountered: