Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect processing of closing round bracket #33

Open
app13y opened this issue Mar 3, 2017 · 2 comments
Open

Incorrect processing of closing round bracket #33

app13y opened this issue Mar 3, 2017 · 2 comments

Comments

@app13y
Copy link

app13y commented Mar 3, 2017

Naive example

Regular expression (foo)(.*(2.)) does not match foo123.

Explanation

This is due the incorrect processing of the closing bracket in foo() function:

/* ... */
if (re[i] == ')') {
    int ind = info->brackets[info->num_brackets - 1].len == -1 
         ? info->num_brackets - 1 
         : depth;
    info->brackets[ind].len = (int) (&re[i] - info->brackets[ind].ptr);
    depth--;
    FAIL_IF(depth < 0, SLRE_UNBALANCED_BRACKETS);
    FAIL_IF(i > 0 && re[i - 1] == '(', SLRE_NO_MATCH);
}
/* ... */

How it is done

  • Last added capturing group is considered:
    • If it is not enclosed, encountered bracket matches opening bracket of this group.
    • Otherwise, encountered bracket matches opening bracket of group with index depth, which is incorrect in general.

In example shown above, last closing bracket will enclose capturing group with index 1, which is foo, and not capturing group .*(2.). Moreover, this issue can be exploited with a wide range of regular expressions; demonstrated one is merely a proof of concept constructed from one of your unit tests.

How it shall be done

Encountered closing bracket shall match last unmatched opening bracket. One shall search for last capturing group, which length is set to -1, and not just assume capturing group with index depth.

/* ... */
if (re[i] == ')') {
    int ind = search_for_unmatched_bracket(info);
    info->brackets[ind].len = (int) (&re[i] - info->brackets[ind].ptr);
    depth--;
    FAIL_IF(depth < 0, SLRE_UNBALANCED_BRACKETS);
    FAIL_IF(i > 0 && re[i - 1] == '(', SLRE_NO_MATCH);
}
/* ... */

Regards, Arseny.
@cpq
Copy link
Member

cpq commented Mar 6, 2017

Thank you!
Could you send a PR with respective snippet in the unit test please?

@NSP-0123456
Copy link

NSP-0123456 commented Oct 9, 2020

A simple way to implement search_for_unmatched_bracket function

is to:

  • loop over brackets from the last position nbBrackets -1 to the first
  • Search and stop for the first unmatched ( where len == - 1) return the current position.
  • if not found, return -1

The bracket matching works perfectly but expression capture get faulty for expression like (a(b))(c) as the ))( is not well interpreted during capture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants