-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make enum RegexParseError and RegexParseException public #38872
Comments
Tagging subscribers to this area: @eerhardt |
Thanks for writing this up.
Do you have evidence folks need to do this? If not, we might leave them out - that's up to the API review folks. For the enum,
Offset should note it's zero based. If we have the constructor it should probably verify RegexParseError isn't out of range, like FileStream does, but that's implementation. (And offset >= 0)
@CyrusNajmabadi any feedback given you've done something similar? specifically on the categories in the enum. |
I wasn't sure about the enum values, to leave them as is, or not. I don't particularly like them, I'll see if I can improve the names. The order of the enum shouldn't matter, I think? But yeah, it's a bit random. I'll go over your other suggestions too. |
I think this is close to being ready for review, save for the comments that Dan mentioned. Considering that it'd probably not get API reviewed in time for .NET 5, I'm going to mark it as Future, but I'd really like to see this get in for .NET 6. I'll just add that we'll need to document this class(and enum) when we make it public. |
I think the only reason would be to make it easier on anyone reading it in the docs. In Visual Studio, I think it orders the completion list by usage. BTW I was going to suggest to remove |
@danmosemsft The VS snippet for an exception has it as the default, heh. I was actually wondering, should I make exceptions in my library serializable? The recently added |
@SingleAccretion we do not recommend future use of BinaryFormatter unless you have no alternative. The primary reason is that it is vital you do not deserialize untrusted input, and it's too easy for system to evolve and end up feeding untrusted input. Eg., you have a game that saves state; users load state from their disk. Later, someone reports a bug and you ask for the saved game to reproduce the bug. @GrabYourPitchforks is working on all up serialization guidance with a focus on doing it securely. He's also devising a plan for helping the ecosystem progressively move off BinaryFormatter. These should be available on the dotnet/designs repo or similar place in due course. Meantime, we're not adding |
@GrabYourPitchforks we should probably ask them to change that, right? |
I see, so I will remove it then. I assume the |
@bartonjs should answer that. |
@danmosemsft, should the exception remain sealed? It doesn't seem sensible to leave it sealed, and u don't think it breaks anything if we open it up for inheritance. What do you think? |
If there's any chance the exception type could be used on .NET Framework, it should still be |
I don't know what the recommendation is on sealed. The API review can correct that if needed. |
@danmosemsft I've gone over your comments and updated the original text. I went over each enum value to check the actual conditions used to throw it. This was a bit surprising at times (as in: not in line with the expected meaning of the token), which I've tried to reflect in suggested new names for the enum cases. You comments were:
Done, removed. I have no evidence, was just a hunch.
For ease of comparison with the original, I've left the order as-is, but chose a naming scheme that should make some sense when ordered alphabetically. I'm fine if the final order would either be alphabetical, or somewhat grouped.
Done, I've tried to make same-meaning-same-wording and where it made sense, I chose clarity-over-brevity (in this case,
Done, including other names that benefited from better naming.
Done:
Done:
Done. While testing each condition one by one, I've come across a few oddities and an unexpected |
@pgovind, I'd argue it'd be a good, and relatively cheap improvement to the regex-experience to have this in for .NET 5 (cheap, because the code is already there, including relevant tests, except for some renaming). Of course, I cannot judge whether this makes the mark and I realize that .NET 5 is close to being frozen, but if it's somehow still possible to put it on the list, that would be really welcome :). |
Are there any we should fix - at the same time as exposing the API? We own the code after all 🙂 and we want the enum values to be useful. |
The 5.0 milestone means "must fix to ship it". This is what Microsoft developers are focusing on. It doesn't mean "not allowed in release" for another few weeks yet. It could get in if we could do API review (likely bottleneck) plus then a community member gets a PR up and we can merge it. |
Got it. I'll volunteer, it's a rather trivial change anyway. |
The new proposed names in the updated OP are sufficient. There can be some debate on whether we should improve certain errors (as in, raise the more specific one if we have enough information). The few I thought make that threshold are in the linked issue (#39075). But to be clear, the current behavior wrt to the error text is as one would expect, the clarification is on the choice of enum names (like: where an enum only ever is used within conditional alternation contexts, I've proposed a name that clarifies precisely that). I've checked the source code and tests for each of them to be sure this is the case, and added a small example of each error to help the reviewers. |
@danmosemsft, is there anything else that needs attention, or can it be marked |
Missed this, excuse me. I think it looks good -- would you mind breaking the diff into additions and subtractions? That isn't what we normally do but this is a bit special because the "subtractions" are internal. So they are irrelevant to API review, only to the implementer. Edit: actually I suggest simply showing the adds, then you can alphabetize which is what they'll likely want.
It would be good to either find a way to hit this (and ideally add a test) or remove it, so we can decide whether we need it or not. One other thing occurred to me -- do we know that the offsets are correct? Those would just be bugs, not influencing the API, but those bugs would be exposed as soon as we offer this API. @eerhardt @pgovind any other feedback or can we mark this api-ready-for-review? |
Some other low value comments
|
Apologies -- my email was confusing. It's scheduled for tomorrow August 11, 10 AM PDT or which should be 19:00 CEST (AFAIK your timezone). The meeting is scheduled for two hours and this issue will be discussed an hour into the meeting. Hope this helps :-) |
@terrajobst, @pgovind, Thanks, you're right, I misread: it's indeed today ;) (I actually clicked the meeting link and it said there was nobody, I drew the wrong conclusion :P). As it looks now, I'll be online by then. And yes, I'm based in Amsterdam, which is CEST. |
namespace System.Text.RegularExpressions
{
[Serializable]
public sealed class RegexParseException : ArgumentException
{
public RegexParseException(RegexParseError error, int offset);
private RegexParseException(SerializationInfo info, StreamingContext context)
{
// It means someone modified the payload.
throw new NotImplementedException();
}
public override void GetObjectData(SerializationInfo info, StreamingContext context)
{
// We'll serialize as an instance of ArgumentException
}
public RegexParseError Error { get; }
public int Offset { get; }
}
public enum RegexParseError
{
Unknown,
AlternationHasComment,
AlternationHasMalformedCondition,
AlternationHasMalformedReference,
AlternationHasNamedCapture,
AlternationHasTooManyConditions,
AlternationHasUndefinedReference,
CaptureGroupNameInvalid,
CaptureGroupOfZero,
ExclusionGroupNotLast,
InsufficientClosingParentheses,
InsufficientOpeningParentheses,
InsufficientOrInvalidHexDigits,
InvalidGroupingConstruct,
InvalidUnicodePropertyEscape,
MalformedNamedReference,
MalformedUnicodePropertyEscape,
MissingControlCharacter,
NestedQuantifiersNotParenthesized,
QuantifierAfterNothing,
QuantifierOrCaptureGroupOutOfRange,
ReversedCharacterRange,
ReversedQuantifierRange,
ShorthandClassInCharacterRange,
UndefinedNamedReference,
UndefinedNumberedReference,
UnescapedEndingBackslash,
UnrecognizedControlCharacter,
UnrecognizedEscape,
UnrecognizedUnicodeProperty,
UnterminatedBracket,
UnterminatedComment
}
} |
@danmosemsft, can we consider this for .NET 5? Seems easy enough :-) |
In Jeremy Barton's words at the meeting: "it would be good if people using reflection currently could fix their code to the public version by doing a |
Team members are all working on work required for 5.0. If a community member volunteers and it can get merged before we branch on Monday, sure. |
@danmosemsft, I can try to get something in by Friday, though I'm not sure about the point of improving the error messages, which is something that tends to need a lot of back and forth to iron out. The renaming part, and fixing the tests should be relatively straightforward. Of course, than it still depends on whether it can be reviewed in time to be merged in time. |
Sure. Totally up to you |
BTW @abelbraaksma the only tricky part of this I expect will be adding the test for binary serialization to/from the new type, to address the concern that this not break. We have tests for this The bulk of such testing is done through blobs in this file - they test to and from for .NET Framework and .NET Core: It's fiddly to update those blobs. But happily I see there's already a specific test however for serializing a RegexParseException on .NET Core and deserializating as an ArgumentException on .NET Framework! |
Adding a test for |
That's what that test does, basically |
@danmosemsft, thanks for the pointers on those tests! I'm actually not sure I can finish today for the simple reason that I couldn't get the build succeed (only to find out that I needed to update VS, oops). Behind a relatively slow connection, the download of 5GB for the update can take a while... Oh, and I get 502's from While I think the changes i made are sound (the renaming part, at least), I'd prefer to at least successfully build it locally again before submitting a PR. Anyway, I expect to submit a working PR somewhere tonight (which may still be day for you :). We'll see how far we get. |
@abelbraaksma I don't know your schedule - we don't branch until Monday, if that helps. |
Hmm, strange, I would swear I could build stuff before. After updating everything (I ran into this briefly too: #40283), I get:
Since this points to the directory in the local
I may be needing the weekend after all ;). |
I have never seen that error and if I got it I would probably |
I've never seen that error either. If your build still fails, tell us your environment too? Or, even better, try it again by passing in I can still most likely review your PR over the weekend if you put it up, so you have some time :) |
Thanks for the support. The error is gone (I edited the comment with the cause: EDIT: Scratch that, what I posted here before about I'm in an x64 Native Tools Command Prompt (and tried a normal Administrator cmd prompt too, same error). |
Ok, this appears to be caused by a new Anyway, 1AM here, tomorrow things will be better. Thanks for the quick replies so far, it is much appreciated! |
Repairing your VS (the newest one?) might make sure the environment variables are correct -not sure though. |
Repair didn't help, nor did any other things I tried to make the So I decided to fire up a VM with a clean Win10 and a clean recent VS2019 and go from there. While this took some time, I have managed to get the initial build going. Hopefully the rest will go smooth now ;) |
Yeesh Msybe you can fix the env vars on your original machine by looking at the ones in your VM.. |
@danmosemsft the starting point is there, but I'll need a little help on how to proceed w.r.t. inline documentation (I assume that's supposed to be added here as well): #40902.
I did something more "brute force", I just copied the whole build dir over to my primary dev machine. Turns out this allowed me to build & run tests for (I also created an Azure VM, but for the last hour or so that has still been building, it is not very fast...) |
For posterity: this has now been implemented, details are in the PR: #40902. Thanks to all for reviewing this proposal and for the valuable feedback. Special thanks to @danmosemsft to help get this past the finish line, and assisting with the parts in this process I was yet unfamiliar with. It was in the nick of time to make it into .NET 5 🥳. |
We'd welcome other contributions if you're interested @abelbraaksma . There's plenty up for grabs. If you want another in regex specifically there are opportunities for perf improvement or features eg #25598 (that one I think is a bit involved..) |
Background and Motivation
A regular expression object made with
System.Text.Regex
is essentially an ad-hoc compiled sub-language that's widely used in the .NET community for searching and replacing strings. But unlike other programming languages, any syntax error is raised as anArgumentException
. Programmers that want to act on specific parsing errors need to manually parse the error string to get more information, which is error-prone, subject to change and sometimes non-deterministic.We already have an internal
RegexParseException
and two properties:Error
andOffset
, which respectively give an enum of the type of error and the location in the string where the error is located. When presently anArgumentException
is raised, it is in fact aRegexParseException
which inheritsArgumentException
.I've checked the existing code and I propose we make
RegexParseException
andRegexParseError
public, these are pretty self-describing at the moment, though theenum
cases may need better named choices (suggested below) . Apart from changing a few existing tests and adding documentation, there are no substantive changes necessary.Use cases
ArgumentException
which is used everywhere.Related requests and proposals
Proposed API
The current API already exists but isn't public. The definitions are as follows:
And the
enum
with suggested names for a more discoverable naming scheme. I followed "clarity over brevity" and have tried to start similar cases with the same moniker, so that an alphabetic listing gives a (somewhat) logical grouping in tooling.I'd suggest we add a case for unknown conditions, something like
UnknownParseError = 0
, which could be used if users create this exception by hand with an invalid enum value.Handy for implementers: Historical view of this prior to 22 July 2020 shows the full diff for the enum field by field. On request, it shows all as an addition diff now, and is ordered alphabetically.
* About
IllegalCondition
, this is thrown inside a conditional alternation like(?(foo)x|y)
, but appears to never be hit. There is no test case covering this error.Usage Examples
Here's an example where we use the additional info to give more detailed feedback to the user:
Alternative Designs
Alternatively, we may remove the type entirely and merely throw an
ArgumentException
. But it is likely that some people rely on the internal type, even though it isn't public, as through reflection the contextual information can be reached and is probably used in regex libraries. Besides, removing it will make any future improvements in dealing with parsing errors and proposing fixes in GUIs much harder to do.Risks
The only risk I can think of is that after exposing this exception, people would like even more details. But that's probably a good thing and only improves the existing API.
Note that:
ArgumentException
continues to work.[danmose: made some more edits]
The text was updated successfully, but these errors were encountered: