-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite loop on re.findall #101
Comments
Hi - thanks for putting in the time to do some debugging! I can well believe this is a backtracking issue - both regexes have the problem that there is a chance for 'mis-matching' - for urls it could be something weird like The easy fix is probably to choose better regexes for these.I'll look at patching these today, and then we probably just have to wait and see if the infinite loop resurfaces. Just going to mention #86 as it may be related... |
That's |
Haven't run into it again, closing |
Oh but I still have extraction disabled, so it means nothing. Guess no one else ran into it in the meantime :) |
Sometimes clipster goes into 100% CPU usage. I attached gdb to the Python process and found that it was stuck in the call to
re.findall
. I don't know if this is a Python bug or an instance of catastrophic backtracking. I haveextract_patterns
disabled so the only potential culprits are the regexes for URIs and emails,r'\b\S+://\S+\b'
andr'\b\S+\@\S+\.\S+\b'
. The latter looks like it could be problematic because\S
matches@
and.
, but I couldn't find a pathological input.I'm disabling
extract_uris
andextract_emails
as a workaround.The text was updated successfully, but these errors were encountered: