tools: add ASCII only lint rule in lib/ #11371

hkal · 2017-02-14T06:48:48Z

Detects if files in lib/ contain non-ASCII characters and
raises a linting error. Also removes non-ASCII characters from
lib/timers.js

Fixes: #11209

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

Affected core subsystem(s)

tools, lib

Trott · 2017-02-14T06:56:14Z

CI: https://ci.nodejs.org/job/node-test-pull-request/6400/

mscdex · 2017-02-14T07:53:27Z

lib/timers.js

-// ║    ╚════ >  Actual JavaScript timeouts
-// ║
-// ╚════ > Linked List
+// |---- > Object Map


Minor nit: maybe slashes would look a tad better for the corners instead of a pipe?

thefourtheye · 2017-02-14T07:58:07Z

This contradicts #11129

aqrln · 2017-02-14T10:23:07Z

@thefourtheye actually it doesn't, to the contrary, I'd say it complements it.

addaleax · 2017-02-14T10:29:58Z

lib/timers.js

+// |    |
+// |    |---- >  Actual JavaScript timeouts
+// |
+// |---- > Linked List


I think it would be okay to make an exception for this file and just add an eslint-disable line for the rule.

(It would be good to have this file as a test that we can use UTF-8 in sources, even if we prefer not to.)

@addaleax I don't really think it is a way to test this. It wasn't possible to use UTF-8 in sources since #5458 and before #11129 yet all worked well since it doesn't actually matter whether comments are encoded correctly while these are just comments.

gibfahn · 2017-02-14T15:37:52Z

@thefourtheye cross-posting @bnoordhuis's comment from #11209 (comment)

Would be good to enforce it because Unicode files take up twice as much space in the binary as plain ASCII files.

jasnell · 2017-02-14T15:41:43Z

I'm really not entirely sure that we should do this.

addaleax · 2017-02-14T15:46:51Z

Fwiw once I have the time (maybe later this week) I’d like to look into making the tooling strip comments during compilation, so that we can at least keep non-ASCII characters inside of comments.

Fishrock123 · 2017-02-14T15:54:32Z

Erm, I added those comments, what is the point of this? The source bundling tool supports UTF-8 as previously linked above.

aqrln · 2017-02-14T16:15:10Z

@addaleax that would be great. Do you mind if I take it? :)

addaleax · 2017-02-14T16:16:17Z

@aqrln You mean, updating the tooling to do that? Sure, go for it! You can ping me if there are any questions :)

bnoordhuis · 2017-02-14T17:32:10Z

@addaleax @aqrln It shouldn't strip line ends though, or line numbers in stack traces won't match with the on-disk files.

aqrln · 2017-02-14T17:47:45Z

@bnoordhuis sure thing :) But thanks for pointing that out anyway.

hkal · 2017-02-15T19:28:09Z

Alright, so it looks like the consensus is we don't want the linter checking for ASCII characters and other solutions will be explored?

addaleax · 2017-02-15T19:30:34Z

@hkal I think it still makes sense to try to enforce this outside of comments… I am not an eslint expert but it looks like adjusting your code should be easy?

Maybe just wait until we’ve resolve the above discussion…

hkal · 2017-02-15T19:52:23Z

@addaleax the code can easily be changed to not include comments. I'll hold off making any changes until a decision has been reached. Thanks!

gibfahn · 2017-02-16T03:53:35Z

I think it still makes sense to try to enforce this outside of comments… I am not an eslint expert but it looks like adjusting your code should be easy?

Maybe just wait until we’ve resolve the above discussion…

@Fishrock123 @thefourtheye are you opposed to a lint rule that enforces ASCII outside of comments?

Fishrock123 · 2017-02-16T15:32:55Z

Not really, I think?

In order to allow using Unicode characters inside comments of built-in JavaScript libraries without forcing them to be stored as UTF-16 data in Node's binary, update the tooling to strip comments during build process. All line breaks are preserved so that line numbers in stack traces aren't broken. Refs: nodejs#11129 Refs: nodejs#11371 (comment)

hkal · 2017-03-08T22:59:21Z

@gibfahn any movement on this?

gibfahn · 2017-03-09T18:23:04Z

@hkal I personally prefer the unicode comment in lib/timers.js, so I'd rather we added an eslint-disable exception for that block and left it as it is. In the future if that is moved out into a separate guide it could be removed.

Otherwise, unless @jasnell or @thefourtheye or any other collaborators disagree, we should be able to move forward on this.

jasnell · 2017-03-10T05:07:22Z

As long as we add the eslint-disable for the timers comment block, I'm ok with this going forward

TimothyGu · 2017-03-20T16:35:17Z

@hkal, can you add a eslint-disable for lib/timers.js as @aqrln and @jasnell suggested? If you do so we'll be able to merge this. Thanks a lot!

thefourtheye · 2017-03-20T18:55:41Z

@gibfahn wrote

@Fishrock123 @thefourtheye are you opposed to a lint rule that enforces ASCII outside of comments?

No, I am not opposed to it.

hkal · 2017-03-22T14:37:49Z

@gibfahn Alright, I think we're good here. Please review.

jasnell · 2017-03-22T14:55:48Z

lib/timers.js

@@ -19,6 +19,7 @@
 // OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
 // USE OR OTHER DEALINGS IN THE SOFTWARE.

+


Unrelated whitespace change :-)

aqrln · 2017-03-22T15:48:54Z

tools/eslint-rules/only-ascii-characters.js

+            const { loc } = token;
+
+            // Will only report the first non-ascii character per line
+            const character = matches[0];


Either the comment is misleading, or it works not the way you planned it to. This will report the first non-ASCII character per token, not per line.

This line makes the values in the tokens (couldn't think of a better name) array look like:

{ type: 'Punctuator', value: ';', start: 22327, end: 22328, loc: SourceLocation { start: Position { line: 767, column: 1 }, end: Position { line: 767, column: 2 } }, range: [ 22327, 22328 ] } { type: 'Line', value: ' Copyright Joyent, Inc. and other Node contributors.', start: 0, end: 54, range: [ 0, 54 ], loc: { start: Position { line: 1, column: 0 }, end: Position { line: 1, column: 54 } } }

In the case of type Line we only report the the first occurrence of a non-ASCII character. In my latest iteration I took the comment out since it didn't really help clarify.

aqrln · 2017-03-22T16:14:57Z

tools/eslint-rules/only-ascii-characters.js

+          }
+        });
+
+        errors.forEach((error) => {


Why not do this in one pass without the extra errors array? You can make this a named function and call it instead of errors.push().

aqrln · 2017-03-22T16:17:14Z

tools/eslint-rules/only-ascii-characters.js

+          const { value } = token;
+          const matches = value.match(nonAsciiPattern);
+
+          if (matches) {


IMO, it would be better to flip the condition and return early (if (!matches) return;) reducing the indentation level for the rest of the function.

Detects if files in lib/ contain non-ASCII characters and raises a linting error. Also removes non-ASCII characters from lib/console.js comments Fixes: nodejs#11209

hkal · 2017-03-24T04:52:14Z

I think I've addressed all the feedback.

@aqrln @jasnell @gibfahn

gibfahn · 2017-03-24T11:06:21Z

As this is an eslint rule addition, cc/ @not-an-aardvark, @silverwind, @Trott, @mscdex

not-an-aardvark · 2017-03-24T15:14:57Z

tools/eslint-rules/only-ascii-characters.js

+        const commentTokens = source.getAllComments();
+        const tokens = sourceTokens.concat(commentTokens);
+
+        tokens.forEach((token) => {


This will fail to match non-ascii whitespace that could appear between tokens, so it's not quite disallowing all non-ascii characters in files. This could be fixed by matching the regex against source.text rather than against each token.

That said, I think the no-irregular-whitespace rule will cover non-ascii whitespace.

If the latter is true, it would be good to have a comment explaining that.

The comment alone would not be sufficient, it is important that the linter ensures there are no irregular Unicode whitespace characters since they cannot be seen during code review.

@aqrln by:

If the latter is true

I mean that if:

I think the no-irregular-whitespace rule will cover non-ascii whitespace.

@not-an-aardvark's theory is correct, and the non-ASCII whitespace characters are already covered in a separate rule, then we could just use that for whitespace, and add a comment in here to explain that we don't need to worry about whitespace as it's covered in another rule.

@gibfahn ah, I see, sorry. I didn't pay enough attention to that "if the latter" part so I didn't understand you right.

not-an-aardvark · 2017-03-24T18:41:42Z

tools/eslint-rules/only-ascii-characters.js

+// Rule Definition
+//------------------------------------------------------------------------------
+
+const nonAsciiPattern = new RegExp('([^\x00-\x7F])', 'g');


It might be clearer to use a regex literal here instead of the RegExp constructor. Right now, \x00 and \x7F are interpreted as part of the string, so the resulting regex pattern actually contains a null character. This still works fine, but it could be confusing for debugging (e.g. if the regex is printed, it will be difficult to tell that it contains a null character).

not-an-aardvark · 2017-03-24T18:45:56Z

tools/eslint-rules/only-ascii-characters.js

+
+          reportError({
+            line: loc.start.line,
+            column,


This could result in an invalid report location if the offending character is in a block comment. For example:

/* foo ■ */

The rule reports an error for this comment at line 1, column 7, but that location doesn't actually exist.

Trott · 2017-03-24T20:10:07Z

tools/eslint-rules/only-ascii-characters.js

+ * @author Kalon Hinds
+ */
+
+/* eslint no-control-regex:0 */


I'd prefer see eslint-disable-line or eslint-disable-next-line to target the places where the control characters are needed.

Fishrock123 · 2017-04-03T14:28:37Z

still not really a fan

BridgeAR · 2017-08-26T10:19:08Z

@hkal would you be so kind and have a look at the other comments and rebase this?

BridgeAR · 2017-09-08T02:07:46Z

Closing this due to a long inactivity period. @hkal thanks for your contribution anyways and please feel free to reopen (or just leave a comment to reopen) if you would like to follow up on this!

nodejs-github-bot added timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. tools Issues and PRs related to the tools directory. labels Feb 14, 2017

mscdex reviewed Feb 14, 2017

View reviewed changes

addaleax reviewed Feb 14, 2017

View reviewed changes

Fishrock123 self-requested a review February 14, 2017 15:54

aqrln mentioned this pull request Feb 16, 2017

tools: make js2c.py strip comments from sources #11417

Closed

2 tasks

aqrln mentioned this pull request Feb 17, 2017

test: add Unicode characters regression test #11423

Closed

3 tasks

hkal force-pushed the only-ascii-characters branch 3 times, most recently from 3a7ff16 to aad90cf Compare March 22, 2017 14:36

jasnell approved these changes Mar 22, 2017

View reviewed changes

aqrln reviewed Mar 22, 2017

View reviewed changes

tools: add ASCII only lint rule for lib/

e7b2a19

Detects if files in lib/ contain non-ASCII characters and raises a linting error. Also removes non-ASCII characters from lib/console.js comments Fixes: nodejs#11209

hkal force-pushed the only-ascii-characters branch from aad90cf to e7b2a19 Compare March 24, 2017 04:39

jasnell approved these changes Mar 24, 2017

View reviewed changes

not-an-aardvark reviewed Mar 24, 2017

View reviewed changes

Trott reviewed Mar 24, 2017

View reviewed changes

refack force-pushed the master branch from 16073c0 to fbe946b Compare April 14, 2017 04:11

jasnell added the stalled Issues and PRs that are stalled. label Aug 29, 2017

BridgeAR closed this Sep 8, 2017

This was referenced Jan 4, 2018

tools: eslint rule to disallow Unicode quotes #17934

Closed

tools: non-Ascii linter for /lib only #18043

Closed

		@@ -19,6 +19,7 @@
		// OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
		// USE OR OTHER DEALINGS IN THE SOFTWARE.

tools: add ASCII only lint rule in lib/ #11371

tools: add ASCII only lint rule in lib/ #11371

Conversation

hkal commented Feb 14, 2017

Checklist

Affected core subsystem(s)

Trott commented Feb 14, 2017

mscdex Feb 14, 2017 • edited Loading

Choose a reason for hiding this comment

thefourtheye commented Feb 14, 2017

aqrln commented Feb 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibfahn commented Feb 14, 2017

jasnell commented Feb 14, 2017

addaleax commented Feb 14, 2017

Fishrock123 commented Feb 14, 2017

aqrln commented Feb 14, 2017 • edited Loading

addaleax commented Feb 14, 2017

bnoordhuis commented Feb 14, 2017

aqrln commented Feb 14, 2017

hkal commented Feb 15, 2017

addaleax commented Feb 15, 2017

hkal commented Feb 15, 2017

gibfahn commented Feb 16, 2017

Fishrock123 commented Feb 16, 2017

hkal commented Mar 8, 2017

gibfahn commented Mar 9, 2017

jasnell commented Mar 10, 2017

TimothyGu commented Mar 20, 2017

thefourtheye commented Mar 20, 2017

hkal commented Mar 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hkal commented Mar 24, 2017

gibfahn commented Mar 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fishrock123 commented Apr 3, 2017

BridgeAR commented Aug 26, 2017

BridgeAR commented Sep 8, 2017 • edited Loading

mscdex Feb 14, 2017 •

edited

Loading

aqrln commented Feb 14, 2017 •

edited

Loading

aqrln commented Feb 14, 2017 •

edited

Loading

BridgeAR commented Sep 8, 2017 •

edited

Loading