Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Regex support for automatically making loops atomic #62449

Closed
stephentoub opened this issue Dec 6, 2021 · 1 comment · Fixed by #63518
Closed

Improve Regex support for automatically making loops atomic #62449

stephentoub opened this issue Dec 6, 2021 · 1 comment · Fixed by #63518
Assignees
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Dec 6, 2021

In .NET 5, we added some support for automatically making some loops atomic. For example, given the expression a+b+c+, we'll convert that into the equivalent of (?>a+)(?>b+)(?>c+): the c+ is at the end of the expression so nothing can backtrack into it, the b+ is always followed by a c and there's nothing b+ can give back that would match c, and similarly the a+ is always followed by a b and there's nothing a+ can give back that would match b. However, tweaking the expression slightly to a*b*c*, we'll convert that into the equivalent of a*b*(?>c*): while we still treat the c* as atomic, when analyzing the a* and b* loops and comparing them against what's guaranteed to come next, we see that the subsequent loop has a 0-lower-bound and give up, even though with a bit more analysis we could see that it's still valid to make all of these atomic. The more we can make atomic, the better, not just for performance but also for readability of code in the source generator.

@stephentoub stephentoub added this to the 7.0.0 milestone Dec 6, 2021
@ghost
Copy link

ghost commented Dec 6, 2021

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

In .NET 5, we added some support for automatically making some loops atomic. For example, given the expression a+b+c+, we'll convert that into the equivalent of (?>a+)(?>b+)(?>c+): the c+is at the end of the expression so nothing can backtrack into it, theb+is always followed by acand there's nothingb+can give back that would matchc, and similarly the a+is always followed by aband there's nothinga+can give back that would matchb. However, tweaking the expression slightly to abc*, we'll convert that into the equivalent of ab(?>c*): while we still treat the c*as atomic, when analyzing thea*andb*` loops and comparing them against what's guaranteed to come next, we see that the subsequent loop has a 0-lower-bound and give up, even though with a bit more analysis we could see that it's still valid to make all of these atomic. The more we can make atomic, the better, not just for performance but also for readability of code in the source generator.

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions, tenet-performance

Milestone: 7.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Dec 6, 2021
@stephentoub stephentoub self-assigned this Jan 3, 2022
@stephentoub stephentoub removed the untriaged New issue has not been triaged by the area owner label Jan 7, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 7, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 19, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant