Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge case that takes too much time #50

Open
kvtoraman opened this issue Apr 12, 2021 · 1 comment
Open

Edge case that takes too much time #50

kvtoraman opened this issue Apr 12, 2021 · 1 comment
Labels

Comments

@kvtoraman
Copy link

Describe the bug
Running pp.clean('http://google.com/..........................') takes too much time. Seems like it's a bug.

To Reproduce

run pp.clean('http://google.com/..........................')

Expected behavior

It can return:

  • '..........................'
  • ''

Desktop (please complete the following information):

  • OS: Linux
  • Python Version: 3.8.5
  • preprocessor version: 0.6.0
@kvtoraman kvtoraman added the bug label Apr 12, 2021
@guanqun-yang
Copy link

guanqun-yang commented Aug 20, 2021

@s @kvtoraman The answer posted here could server as a workaround by skipping cases where the runtime is too long. For example, for the edge case

http://google.com/..........................

The following code will terminate after 2 seconds

import signal
import preprocessor as p

class TimeoutException(Exception):
    pass

def timeout_handler(signum, frame):
    raise TimeoutException

text_list = ["http://google.com/..........................", "hello world :+1: "]

signal.signal(signal.SIGALRM, timeout_handler)
for text in text_list:
    signal.alarm(2)
    try:
        text = p.clean(text)
    except TimeoutException:
        print(f"Could not handle the {text}")
    else:
        signal.alarm(0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants