Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to cancel hs_scan*() #139

Open
rschu1ze opened this issue Feb 8, 2023 · 5 comments
Open

Allow to cancel hs_scan*() #139

rschu1ze opened this issue Feb 8, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@rschu1ze
Copy link

rschu1ze commented Feb 8, 2023

We (ClickHouse) recently encountered some patterns which are extremely expensive to evaluate with vector/hyperscan, for example bounded repeats "x{n,m}" (these are also documented as being expensive). As a mitigation, we now check patterns on a best-effort basis and reject them when they will likely be expensive.

A better solution would be to either

  • add a new method to vector/hyperscan that predicts runtime costs ("fast"/"slow" will be sufficient), or
  • (the preferred alternative) allow canceling the scan. Functions hs_scan_*() (*) are provided callbacks which can stop the scan but they are only called when a match is found. Ideally, a second callback can be provided which is called regularly (every N "steps" - whatever that means in the context of vectorscan). I know that vectorscan attempts to stay API-compatible with hyperscan, so these callbacks could be added as new parameters with default value.

EDIT: Just noticed that pattern compilation, i.e. hs_compile_multi(), becomes slow (not: the scan). A callback for canceling hs_compile_*() would be great.

(*) ClickHouse actually only uses block mode, not streaming or vector modes.

@markos
Copy link

markos commented Mar 23, 2023

Hi @rschu1ze we can provide the second method, but it will go in the next version, this one (5.4.9) needs to be released asap, it's already overdue.

@markos markos self-assigned this Mar 23, 2023
@markos markos added the enhancement New feature or request label Mar 23, 2023
@markos markos added this to the 5.4.10 milestone Mar 23, 2023
@rschu1ze
Copy link
Author

That would be awesome, thanks :)

@markos markos modified the milestones: 5.4.10, 5.4.11 Sep 5, 2023
@markos
Copy link

markos commented Sep 5, 2023

We need to release 5.4.10 asap, so this is moved to next version, however this will not take that long as we have increased our resources in this project.

@markos
Copy link

markos commented Nov 21, 2023

@rschu1ze we will begin development of this feature now. As explained in the Readme, due to the recent closed-sourcing of original hyperscan project for versions >5.4, we will continue to keep compatibility with this version, but we will not pursue compatibility with later IPL hyperscan versions. This is actually a good thing for us, as it allows us to extend functionality without needing to chase the original project anymore.

Now, with regards to this problem, we intend to add a few more hs_scan_*_extended() functions that can do things that the original API does not provide, but without changing the original API.

We will start with adding another periodic callback function as you called it, with a user provided period. Is there anything else that you would like to add in this, now that we're still in the design phase?

@rschu1ze
Copy link
Author

rschu1ze commented Jan 7, 2024

@markos Sorry for not checking back earlier.

New functions hs_scan_*_extended() would be fine for us (and I understand your motivation of not breaking existing use cases). But we would also be fine with extending hs_scan_*() itself, e.g. in a new API-incompatible major version.

We will start with adding another periodic callback function as you called it, with a user provided period.

Sounds good, looking forward to this. The only addition I would have is that pattern compilation is also prone to ReDoS attacks, meaning that a similar mechanism in hs_compile_*() would be helpful.

@markos markos modified the milestones: 5.4.12, 5.5+ Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants