Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Regular Expression Literals #5806

Closed
alrz opened this issue Oct 8, 2015 · 18 comments
Closed

Proposal: Regular Expression Literals #5806

alrz opened this issue Oct 8, 2015 · 18 comments

Comments

@alrz
Copy link
Contributor

alrz commented Oct 8, 2015

Since regular expressions are very commonly used in all kind of applications, I think this is the time to make it even easier to use. Currently static methods of Regex class, compile and cache patterns to avoid reparsing. By default, the last 15 regular expressions are cached, although the size of the cache can be adjusted by setting the value of the CacheSize property. On the other hand, with regular expression literals, they can be compiled at compile-time, so invalid regex would be a compile-time error and there would not be a cache limit.

if(~"^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d$".IsMatch( ... )) {

}

Regular expression literals would be implicitly verbatim so there is no need to put a @ sign to avoid additional escape characters.

@orthoxerox
Copy link
Contributor

I had a similar proposal in #5403, but yours explicitly suggests compile-time evaluation. If we ever get compile-time evaluation (and metaprogramming), we'll be able to have all kinds of wonderful stuff like actual readable grammar DSLs that obsolete all but the most trivial regexes.

@dsaf
Copy link

dsaf commented Oct 9, 2015

> ...very commonly used...

My team only uses them maybe once per 5 mid-size projects.

@alrz
Copy link
Contributor Author

alrz commented Oct 9, 2015

@dsaf No argument with that.

@orthoxerox Yeah we all had #263, #4971. As discussed before, with existing extension methods, custom literals become useless, besides, you can't define a compile-time custom literal. Regular expression literals, on the other hand, are specifically useful in string patterns (#5811).

@vladd
Copy link

vladd commented Oct 10, 2015

I personally doubt that the regular expressions are of so a great value to make a dedicated syntax for them. Regular expressions are way too often unreadable, undebuggable, slow and inefficient, so endorsing them seems to be not a good thing.
Of course, there are valid cases for heavy usage of them, and a custom syntax would be very valuable. I see another way out of the problem: allow for introducing custom, pluggable (user-defined?) syntax constructions into the language. So any team can opt in for an additional syntactical sugar for whatever they like. (This sounds like support of DSLs inside C#, but I feel this must be the right way.)

@alrz
Copy link
Contributor Author

alrz commented Oct 10, 2015

@vladd

unreadable

IgnorePatternWhitespace option somehow addressed the unreadability issue.

undebuggable

It's a good practice to break the operation into smaller, more understandable regular expressions, so it would be nice to have a concise syntax for it. +

slow and inefficient

I didn't say you should write entire compiler with regular expressions! they are concise and fast (development-time and runtime) for their own use cases.

I would like to hear your comment on string patterns (#5811) too.

@vladd
Copy link

vladd commented Oct 10, 2015

@alrz I completely agree with your points. But I still think that regexes are too much [ab]used for the tasks they are not a good fit for, and creating special syntax for just them would silently imply that it has to be the "tool of choice" for C# developers. The points you brought mean that successful usage of regular expressions requires certain level of development culture, so my feeling is that this would lead the languages a step away from the "developers' pit of success".

(Wrote a comment about string patterns in the appropriate issue.)

@alrz
Copy link
Contributor Author

alrz commented Oct 10, 2015

@vladd If your point is that it shouldn't be special for _just_ regular expressions, I'd say that custom literals have been proposed before and passed. because they don't provide any advantages over extension methods.

Moreover, I think VB's Like operator, for example, which has invented its own pattern syntax, is much worse than this.

@vladd
Copy link

vladd commented Oct 10, 2015

@alrz Well, you can clearly see that as for your proposal, custom literals could achieve more than just an extension method could possibly do. So I don't think there's no advantage in custom literals over extension methods.

And that's why I am a proponent of custom literals. But in order to make them efficient, one needs some decent metaprogramming facility in the language, so that one could instruct the compiler to do something more than just a runtime compilation for e.g. regex.

(And of course I'm not a fan of VB's Like operator.)

@alrz
Copy link
Contributor Author

alrz commented Oct 10, 2015

@vladd

in order to make them efficient, one needs some decent metaprogramming facility in the language, so that one could instruct the compiler to do something more than just a runtime compilation

Yes exactly, checkout the comments in #4971, #263 and #5403. That needs a more sophisticated proposal/feature set. However, I narrowed it down to regular expressions to make string patterns more flexible.

@dsaf
Copy link

dsaf commented Oct 10, 2015

@vladd

I personally doubt that the regular expressions are of so a great value to make a dedicated syntax for them.

Agreed. C# doesn't need built-in support for any other language be it XML, JSON, SQL or Regex. I doubt performance of Regex is that important and the IDE support can be achieved by other means: http://blog.jetbrains.com/dotnet/2014/10/27/regular-expression-support-in-resharper-9/ .

I see another way out of the problem: allow for introducing custom, pluggable (user-defined?) syntax constructions into the language. So any team can opt in for an additional syntactical sugar for whatever they like. (This sounds like support of DSLs inside C#, but I feel this must be the right way.)

This is important. Look at this guy implementing React in .NET: https://github.com/demigor/nreact
So far Microsoft only cared enough to support React in TypeScript: microsoft/TypeScript#3203

@alrz
Copy link
Contributor Author

alrz commented Oct 10, 2015

@dsaf Actually, main motivation for this, is to facilitate string patterns, I just wanted to propose it as a separate feature so one can use them outside of patterns too.

@GSPP
Copy link

GSPP commented Nov 15, 2015

How would options be specified? There are some options that commonly vary, for example pattern whitespace, case sensitivity and culture. There is no way to pick those options so that it works for the majority of cases.

At least we wouldn't need the Compiled option anymore, which is usually forgotten but perf critical.

I would not implement this proposal because the team has a dozen more important issues at hand. Regexes are not that common. Not sure if there is any project that has more than one per 200 LOC. That would be a very high density already.

@alrz
Copy link
Contributor Author

alrz commented Nov 15, 2015

@GSPP For options, something like JavaScript's syntax can be used. "Regexes are not that blah" As you can not see, this isn't currently under consideration "I would not implement this proposal" No one wanted you to implement this proposal. "the team has a dozen more important issues at hand." and I didn't know that you are the PM.

@GSPP
Copy link

GSPP commented Nov 15, 2015

@alrz the first sentence of your post added something to the discussion. The rest reads like it was meant to offend.

@alrz
Copy link
Contributor Author

alrz commented Nov 15, 2015

@GSPP the first paragraph of your comment added something to the discussion which I replied. the last paragraph, well, I replied that too. nothing more.

@HaloFour
Copy link

@GSPP Many of those options can be specified inline within the pattern, e.g. "(?i)abc" is case-insensitive. Some languages like JavaScript use a suffix at the end of the literal to specific special options or pattern-wide options. Since this proposal uses ~ to prefix the pattern maybe a prefix would be better, but I'd adapt the syntax accordingly.

// case-insensitive, multiline, explicit-capture and invariant-culture
var regex = ~imec"^(foo)+$";

@bbarry
Copy link

bbarry commented Nov 16, 2015

.NET regular expressions can already be compiled down to a type as a step during compilation. See https://msdn.microsoft.com/en-us/library/9ek5zak6%28v=vs.110%29.aspx

The trouble with this design is that it requires the expressions to be compiled as a separate step completely disconnected from the code that uses them; the resulting assembly is then referenced by the library that needs the expression. It would be nice if this existing functionality was easier to use.

@daniel-white
Copy link

IMO a literal that looks like an ECMAScript would be awesome. Using string literals even with the @ prefix it adds many symbols that aren't relevant to the pattern.

@alrz alrz closed this as completed Mar 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants