-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex parser nits #88558
Regex parser nits #88558
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue Detailstrivial changes.
|
@@ -1158,8 +1158,6 @@ private void ScanBlank() | |||
/// <summary>Scans \-style backreferences and character escapes</summary> | |||
private RegexNode? ScanBasicBackslash(bool scanOnly) | |||
{ | |||
Debug.Assert(_pos < _pattern.Length, "The current reading position must not be at the end of the pattern"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed these asserts because there'll be a null ref a couple lines down in each case; we don't put asserts in front of every possible null ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understood the comment, but it looks to me that these are asserting The current reading position
not a null ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have said IndexOutOfRangeException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't put asserts in front of every possible null ref.
No, but we do put asserts in places that are meant to act as preconditions / contracts. These asserts make it really easy for a maintainer to see that these methods should only be called when there's pattern remaining. I don't think removing them was valuable.
446dc42
to
87951a3
Compare
87951a3
to
5251fa4
Compare
Test failures unrelated |
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Show resolved
Hide resolved
I'm not seeing it... which line is the change to presize the dictionary? |
Thanks for spotting that -- it got lost in a rebase, apparently: |
How much of an overallocation is it for those? i.e. we could probably avoid it for 100% of the real-world patterns if we made it Said another way, what does the histogram look like for number of strings involved in the various patterns? |
https://gist.github.com/danmoseley/eeb38412d74c3eb22bc69472940b1f95 Note horizontal scale starts skipping around 60 |
For something like this, it would be nice if the dictionary could start with on stack buffers. |
If the dictionary starts at zero, resizes when full to the next prime at least 2x the size, then it will have sizes 0, 3, 7, 17, 37, 89 61% would fit in 3, 80% in 7 and 96% in 17. Only 5% are zero and would not allocate at all.
|
Right. So presizing to 15 would overallocate for the majority. |
I guess it depends how you trade off a small transient allocation with a small CPU cost of copying. I do not think this is a hot path when aggregated with actually running the interpreter. But it seemed that we have some hard data here and it was easy to avoid that CPU cost in 90% of cases. I don't mind either way. |
Thanks.
There's CPU cost associated with overallocation as well. I don't think presizing it to 15 is the right tradeoff. |
trivial changes.