Normative: Add RegExp `v` flag with set notation and properties of strings #2418

mathiasbynens · 2021-05-28T11:28:01Z

Proposal repo: https://github.com/tc39/proposal-regexp-set-notation

Preview link for the spec changes with inline diffs: https://arai-a.github.io/ecma262-compare/?pr=2418

spec.html

bakkot · 2021-11-30T00:07:46Z

This PR still has a bunch of "FYI" comments in it, which I assume you don't intend. Is it actually ready for review?

ljharb · 2021-12-01T22:41:02Z

@FrankYFTang we don't use separated html files; can you inline the table?

markusicu · 2021-12-02T00:06:55Z

This PR still has a bunch of "FYI" comments in it, which I assume you don't intend. Is it actually ready for review?

We thought that it's ok for stage 3 to still have editorial notes and explainers. We intend to eventually turn many of them into permanent NOTEs or whatever is a reasonable format, based on feedback. Ok?

bakkot · 2021-12-02T00:12:14Z

Sure, that's ok. So do you just want the normative parts reviewed at the moment? Also, do you intend to address the TODOs before review? Some of them look like they'd be normative.

markusicu · 2021-12-02T00:16:13Z

So do you just want the normative parts reviewed at the moment? Also, do you intend to address the TODOs before review? Some of them look like they'd be normative.

I believe that we covered nearly everything, but I plan to go through it myself once more with a fine-tooth comb.

I hope that this will not block stage 3 reviewers from reviewing and providing feedback.

ljharb · 2021-12-02T07:23:12Z

(Generally those things are fine in the proposal, but don't go in the spec PR)

FrankYFTang · 2021-12-02T10:25:18Z

@FrankYFTang we don't use separated html files; can you inline the table?

See
https://github.com/tc39/ecma262/blob/main/table-binary-unicode-properties.html
https://github.com/tc39/ecma262/blob/main/table-nonbinary-unicode-properties.html
https://github.com/tc39/ecma262/blob/main/table-unicode-general-category-values.html
https://github.com/tc39/ecma262/blob/main/table-unicode-script-values.html

Notice these table are separated html files because for every new verison of Unicode standard @mathiasbynens update them. I am not sure he use some tool to generate them or by hand in the past. I just follow that

See
#1041
#1218

bakkot · 2021-12-07T03:22:32Z

What is \q? The proposal readme doesn't say, I don't recall it being presented, and neither the google doc nor this PR give it any semantics.

markusicu · 2021-12-07T04:05:36Z

What is \q? The proposal readme doesn't say, I don't recall it being presented, and neither the google doc nor this PR give it any semantics.

We flip-flopped on the string literal syntax, and went back from (string|literal) to \q{string|literal} which is what the Unicode regex spec (UTS 18) recommends. See tc39/proposal-regexp-v-flag#33 (comment)

This is one of a few things that we will note specially in next week's meeting.

We did edit the readme to change example string literals to this syntax.

bakkot

Had some initial comments. I haven't done a thorough review yet, this was just a first pass.

spec.html

bakkot · 2021-12-07T07:15:10Z

spec.html

+                If _UnicodeSets_ is *false*, then a CharSetElement is a character in the sense of the Pattern Semantics above.
+              </li>
+              <li>
+                If _UnicodeSets_ is *true*, then a CharSetElement is either a character in the sense of the Pattern Semantics above, or it is a sequence of characters, that is, a string. This includes the empty String and strings with more than 1 character. A string of length 1 is the same as a single character.


If this type can hold sequences of multiple code points, it should be renamed to something other than CharSet.

Depending on the flags, it's a set of code units, or code points, or code points plus sequences of code points. The concept of "character" under the new flag encompasses single code points but is also sufficiently general for the logical concept of "character" which frequently includes things that are encoded in Unicode with sequences of multiple code points. (Of course not vice versa.)

Also, renaming it would affect parts of the spec that need not be touched for the normative changes here.

There may or may not be a better name for this. If we find one, can we make that change later, on its own (separate from these normative changes)?

This name is confusing enough that I would be quite uncomfortable landing it unless we've though looked hard for better names and failed to find one. I'll raise this at the editor call next week.

Editors are agreed this will need a new name before landing the PR. That doesn't need to block anything until that point, though; it can still get stage 3.

spec.html

mathiasbynens · 2021-12-07T08:14:01Z

(Note that we're not yet asking for Stage 3 advancement, although we are asking Stage 3 reviewers to start reviewing: tc39/agendas#1093)

bakkot · 2021-12-09T19:05:26Z

Incidentally this will need a rebase, mostly due to #2531. I can take care of that for you, if you'd like. Sorry about the conflict, although I do think #2531 will make this easier to follow.

mathiasbynens · 2021-12-09T22:12:39Z

Incidentally this will need a rebase, mostly due to #2531. I can take care of that for you, if you'd like.

That'd be lovely, if you don't mind! Thanks.

spec.html

bakkot · 2021-12-11T07:24:12Z

OK, rebased and adopted the conventions of #2531. Not certain I did it perfectly, but it seems to match up pretty well.

jmdyck

You'll need to add UnicodeSetsMode to Term, Assertion, QuantifiableAssertion, and ExtendedAtom.

You added UnicodeSetsMode to the LHS occurrences of those symbols, but not to their RHS occurrences.

Not sure about the prefix. ? might work everywhere, but it might be clearer to use ~ when a RHS is guarded by [~UnicodeMode].

On closer examination, QuantifiableAssertion and ExtendedAtom are only used under [~UnicodeMode], so assuming that the combination ~UnicodeMode, +UnicodeSetsMode never occurs, you don't need to add UnicodeSetsMode to those two nonterminals, you can instead just change ?UnicodeSetsMode to ~UnicodeSetsMode in their RHSs.

Here's what I mean: mathiasbynens#26

spec.html

mathiasbynens · 2023-06-05T07:56:55Z

I’ve rebased this and the esmeta check now passes (thanks to #3078).

Are there any other blockers preventing this PR from being accepted?

michaelficarra · 2023-06-06T00:11:43Z

@mathiasbynens No, just waiting on review from one more editor.

bakkot

Some relatively minor comments, of which only the CharSet one absolutely needs to be addressed before landing. Otherwise looks good to me.

bakkot · 2023-06-11T02:57:28Z

spec.html

+            A <dfn>CharSetElement</dfn> is one of the two following entities:
+            <ul>
+              <li>
+                If _rer_.[[UnicodeSets]] is *false*, then a CharSetElement is a character in the sense of the Pattern Semantics above.


I don't love this unbound alias, but I guess I don't have a better suggestion.

One possibility would be: "If 'v' does/doesn't appear in the RegExp's flags, ..." although then "the RegExp" doesn't have an antecedent.

Maybe "In the context of a RegExp with/without a 'v' flag, ..."

spec.html

syg

lgtm % question

spec.html

syg

Still lgtm, thanks for the CharSetElement changes.

This enables the use of set notation, string literal syntax, and Unicode properties of strings in regular expressions. Proposal repo: https://github.com/tc39/proposal-regexp-set-notation Co-authored-by: Markus Scherer <[email protected]>

changes addressed

mathiasbynens · 2023-06-15T21:06:15Z

Thanks everyone!

... prompted by tc39#2418 (comment) from `Atom :: CharacterClass` semantics to `AtomEscape :: CharacterClassEscape` semantics.

- In CompileAtom, take the wording-changes under `Atom :: CharacterClass` that were prompted by tc39#2418 (comment) and copy them to `AtomEscape :: CharacterClassEscape`. - In various operations, change more occurrences of 'element' to 'CharSetElement'. - In CompileAtom, change 2 occurrences of "which" to "that" because the usage is restrictive.

ljharb changed the title ~~[Normative] Add RegExp v flag with set notation and properties of strings~~ Normative: Add RegExp v flag with set notation and properties of strings May 31, 2021

ljharb force-pushed the master branch 3 times, most recently from 3d0c24c to 7a79833 Compare June 29, 2021 02:21

mathiasbynens force-pushed the regexp-v-flag branch from 86df176 to 49a291a Compare October 18, 2021 06:38

mathiasbynens mentioned this pull request Nov 14, 2021

Tool only showing diff for the first commit in a PR arai-a/ecma262-compare#74

Closed

ljharb reviewed Nov 15, 2021

View reviewed changes

spec.html Outdated Show resolved Hide resolved

markusicu mentioned this pull request Nov 17, 2021

Advance to Stage 3 tc39/proposal-regexp-v-flag#24

Closed

9 tasks

mathiasbynens marked this pull request as ready for review November 17, 2021 07:10

bakkot reviewed Dec 7, 2021

View reviewed changes

bakkot reviewed Dec 11, 2021

View reviewed changes

spec.html Show resolved Hide resolved

bakkot force-pushed the regexp-v-flag branch from 2c542b3 to c8482ef Compare December 11, 2021 07:23

bakkot force-pushed the regexp-v-flag branch from c8482ef to b2499cf Compare December 11, 2021 07:41

This was referenced May 27, 2023

Fixed bugs in Spec parser for terminals (#145) es-meta/esmeta#146

Merged

Meta: Upgrade ESMeta to v0.3.1 #3078

Merged

mathiasbynens requested a review from jmdyck May 30, 2023 10:35

jmdyck previously requested changes May 30, 2023

View reviewed changes

jmdyck reviewed May 30, 2023

View reviewed changes

spec.html Outdated Show resolved Hide resolved

mathiasbynens requested a review from jmdyck May 31, 2023 08:01

ljharb requested a review from a team June 6, 2023 00:28

bakkot reviewed Jun 11, 2023

View reviewed changes

atjn mentioned this pull request Jun 11, 2023

Track support for regex unicodeSets flag mdn/browser-compat-data#20091

Merged

bakkot approved these changes Jun 12, 2023

View reviewed changes

syg approved these changes Jun 13, 2023

View reviewed changes

spec.html Outdated Show resolved Hide resolved

atjn mentioned this pull request Jun 14, 2023

ECMAScript regexp v flag firasdib/Regex101#2074

Closed

michaelficarra added editor call to be discussed in the next editor call and removed editor call to be discussed in the next editor call labels Jun 14, 2023

zcorpan mentioned this pull request Jun 15, 2023

RegExp v flag not documented mdn/content#27346

Closed

syg approved these changes Jun 15, 2023

View reviewed changes

michaelficarra added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Jun 15, 2023

Normative: Add RegExp v flag

26b2369

This enables the use of set notation, string literal syntax, and Unicode properties of strings in regular expressions. Proposal repo: https://github.com/tc39/proposal-regexp-set-notation Co-authored-by: Markus Scherer <[email protected]>

ljharb force-pushed the regexp-v-flag branch from 4cab4aa to 26b2369 Compare June 15, 2023 19:49

ljharb merged commit 26b2369 into tc39:main Jun 15, 2023

mathiasbynens deleted the regexp-v-flag branch June 15, 2023 21:06

jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jun 16, 2023

In CompileAtom, copy wording-changes

7cc4acf

... prompted by tc39#2418 (comment) from `Atom :: CharacterClass` semantics to `AtomEscape :: CharacterClassEscape` semantics.

bakkot mentioned this pull request Jul 3, 2023

fix "there exists"/"such that" not followed by a var name tc39/ecmarkup#538

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normative: Add RegExp `v` flag with set notation and properties of strings #2418

Normative: Add RegExp `v` flag with set notation and properties of strings #2418

mathiasbynens commented May 28, 2021 •

edited

Loading

bakkot commented Nov 30, 2021

ljharb commented Dec 1, 2021

markusicu commented Dec 2, 2021

bakkot commented Dec 2, 2021

markusicu commented Dec 2, 2021

ljharb commented Dec 2, 2021

FrankYFTang commented Dec 2, 2021

bakkot commented Dec 7, 2021

markusicu commented Dec 7, 2021 •

edited

Loading

bakkot left a comment

bakkot Dec 7, 2021 •

edited

Loading

markusicu Jan 27, 2022

bakkot Jan 27, 2022

bakkot Feb 4, 2022

mathiasbynens commented Dec 7, 2021

bakkot commented Dec 9, 2021

mathiasbynens commented Dec 9, 2021

bakkot commented Dec 11, 2021

jmdyck left a comment

mathiasbynens commented Jun 5, 2023

michaelficarra commented Jun 6, 2023

bakkot left a comment

bakkot Jun 11, 2023

mathiasbynens Jun 11, 2023

jmdyck Jun 11, 2023

syg left a comment

syg left a comment

mathiasbynens commented Jun 15, 2023

Normative: Add RegExp v flag with set notation and properties of strings #2418

Normative: Add RegExp v flag with set notation and properties of strings #2418

Conversation

mathiasbynens commented May 28, 2021 • edited Loading

bakkot commented Nov 30, 2021

ljharb commented Dec 1, 2021

markusicu commented Dec 2, 2021

bakkot commented Dec 2, 2021

markusicu commented Dec 2, 2021

ljharb commented Dec 2, 2021

FrankYFTang commented Dec 2, 2021

bakkot commented Dec 7, 2021

markusicu commented Dec 7, 2021 • edited Loading

bakkot left a comment

Choose a reason for hiding this comment

bakkot Dec 7, 2021 • edited Loading

Choose a reason for hiding this comment

markusicu Jan 27, 2022

Choose a reason for hiding this comment

bakkot Jan 27, 2022

Choose a reason for hiding this comment

bakkot Feb 4, 2022

Choose a reason for hiding this comment

mathiasbynens commented Dec 7, 2021

bakkot commented Dec 9, 2021

mathiasbynens commented Dec 9, 2021

bakkot commented Dec 11, 2021

jmdyck left a comment

Choose a reason for hiding this comment

mathiasbynens commented Jun 5, 2023

michaelficarra commented Jun 6, 2023

bakkot left a comment

Choose a reason for hiding this comment

bakkot Jun 11, 2023

Choose a reason for hiding this comment

mathiasbynens Jun 11, 2023

Choose a reason for hiding this comment

jmdyck Jun 11, 2023

Choose a reason for hiding this comment

syg left a comment

Choose a reason for hiding this comment

syg left a comment

Choose a reason for hiding this comment

mathiasbynens commented Jun 15, 2023

Normative: Add RegExp `v` flag with set notation and properties of strings #2418

Normative: Add RegExp `v` flag with set notation and properties of strings #2418

mathiasbynens commented May 28, 2021 •

edited

Loading

markusicu commented Dec 7, 2021 •

edited

Loading

bakkot Dec 7, 2021 •

edited

Loading