Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spelling correction #15507

Merged
merged 13 commits into from
May 9, 2017
Merged

Spelling correction #15507

merged 13 commits into from
May 9, 2017

Conversation

sandersn
Copy link
Member

@sandersn sandersn commented May 1, 2017

Suggest spellings based on Levenshtein distance. This includes a quick fix.

Some tweaks that are possible:

  1. Don't suggest spellings from the global scope. The main value seems to be for the tests trying to do new number(), which probably doesn't happen much in the real world.
  2. Only report 100 suggestions and stop afterwards. If your program has more than 100 unknown symbols, it's probably missing some libraries, not full of typos.
  3. Lower the threshold slightly to 0.6 * source.length from 0.7 * source.length. This would prevent 2-character suggestions entirely (currently they can add one letter) and restrict 3-character suggestions to adding one letter (currently they can also substitute one letter).

Performance seems to be about the same based on timing jake runtests on a 2 Ghz CPU. If I add the 100-error cutoff or stop suggestions from global scope, then I think there won't be any concerns about performance anyway.

Fixes #15333
Fixes #15565

Copy link
Contributor

@mhegazy mhegazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any upper bounds on how often we do this.

~
!!! error TS1005: ',' expected.
~~~~~~
!!! error TS2304: Cannot find name 'string'.
!!! error TS2552: Cannot find name 'string'. Did you mean 'String'?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i though we are not doing the global scope.. where are these coming from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested some tweaks but haven't implemented them yet.

If your compilation has more than 100 unknown symbols, the problem is
missing declarations, not typos.
@sandersn
Copy link
Member Author

sandersn commented May 1, 2017

I added a 100-error cutoff for the spelling suggestions.

@sandersn
Copy link
Member Author

sandersn commented May 2, 2017

@RyanCavanaugh @DanielRosenwasser feel like looking at a whole bunch of spelling corrections?

@DanielRosenwasser
Copy link
Member

Some of these are so off (e.g. Couldn't find any - did you mean NaN?), it might be adding an extra heuristic here. One I had in mind is that the two have to start with the same lower-cased letter.

@@ -12,7 +12,7 @@ tests/cases/conformance/externalModules/exportNonInitializedVariablesES6.ts(3,6)
~~~
!!! error TS1214: Identifier expected. 'let' is a reserved word in strict mode. Modules are automatically in strict mode.
~~~
!!! error TS2304: Cannot find name 'let'.
!!! error TS2552: Cannot find name 'let'. Did you mean 'Set'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't issue suggestions in this case

@@ -1,7 +1,7 @@
tests/cases/compiler/newExpressionWithCast.ts(3,12): error TS7009: 'new' expression, whose target lacks a construct signature, implicitly has an 'any' type.
tests/cases/compiler/newExpressionWithCast.ts(7,13): error TS2365: Operator '>' cannot be applied to types 'boolean' and 'void'.
tests/cases/compiler/newExpressionWithCast.ts(7,17): error TS1109: Expression expected.
tests/cases/compiler/newExpressionWithCast.ts(7,18): error TS2304: Cannot find name 'any'.
tests/cases/compiler/newExpressionWithCast.ts(7,18): error TS2552: Cannot find name 'any'. Did you mean 'NaN'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already (or should?) have a check for when an identifier exists in the type namespace but not the value namespace - we probably shouldn't issue suggestions there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens because checkAndReportErrorForUsingTypeAsValue doesn't have a special case to include primitives. It should though! I'll open a bug for it.

Copy link
Member Author

@sandersn sandersn May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #15565

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to fix it here because it'll make the black list smaller; if you legitimately misspell Boolen then I think we should give a suggestion.

Not for NaN though. Nobody needs more NaN in their code.

@@ -1,4 +1,4 @@
tests/cases/conformance/parser/ecmascript5/parser10.1.1-8gs.ts(16,7): error TS2304: Cannot find name 'NotEarlyError'.
tests/cases/conformance/parser/ecmascript5/parser10.1.1-8gs.ts(16,7): error TS2552: Cannot find name 'NotEarlyError'. Did you mean 'SyntaxError'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way too far

@@ -1,6 +1,6 @@
tests/cases/conformance/parser/ecmascript5/parser15.4.4.14-9-2.ts(16,15): error TS2453: The type argument for type parameter 'T' cannot be inferred from the usage. Consider specifying the type arguments explicitly.
Type argument candidate 'number' is not a valid type argument because it is not a supertype of candidate 'false'.
tests/cases/conformance/parser/ecmascript5/parser15.4.4.14-9-2.ts(25,1): error TS2304: Cannot find name 'runTestCase'.
tests/cases/conformance/parser/ecmascript5/parser15.4.4.14-9-2.ts(25,1): error TS2552: Cannot find name 'runTestCase'. Did you mean 'testcase'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too far

tests/cases/conformance/types/any/narrowFromAnyWithInstanceof.ts(17,7): error TS2339: Property 'mesage' does not exist on type 'Error'.
tests/cases/conformance/types/any/narrowFromAnyWithInstanceof.ts(22,7): error TS2339: Property 'getHuors' does not exist on type 'Date'.
tests/cases/conformance/types/any/narrowFromAnyWithInstanceof.ts(17,7): error TS2551: Property 'mesage' does not exist on type 'Error'. Did you mean 'message'?
tests/cases/conformance/types/any/narrowFromAnyWithInstanceof.ts(22,7): error TS2551: Property 'getHuors' does not exist on type 'Date'. Did you mean 'getHours'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A++++

tests/cases/conformance/parser/ecmascript5/parserRealSource8.ts(314,55): error TS2304: Cannot find name 'DualStringHashTable'.
tests/cases/conformance/parser/ecmascript5/parserRealSource8.ts(314,96): error TS2304: Cannot find name 'StringHashTable'.
tests/cases/conformance/parser/ecmascript5/parserRealSource8.ts(315,42): error TS2304: Cannot find name 'StringHashTable'.
tests/cases/conformance/parser/ecmascript5/parserRealSource8.ts(316,44): error TS2304: Cannot find name 'ScopedMembers'.
tests/cases/conformance/parser/ecmascript5/parserRealSource8.ts(316,44): error TS2552: Cannot find name 'ScopedMembers'. Did you mean 'funcMembers'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too far

@@ -397,7 +397,7 @@ tests/cases/conformance/parser/ecmascript5/parserRealSource8.ts(454,35): error T
!!! error TS2304: Cannot find name 'Type'.
var withSymbol = new WithSymbol(withStmt.minChar, context.typeFlow.checker.locationInfo.unitIndex, withType);
~~~~~~~~~~
!!! error TS2304: Cannot find name 'WithSymbol'.
!!! error TS2552: Cannot find name 'WithSymbol'. Did you mean 'withSymbol'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A+++

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd only give this one an A because running the code fix would produce var withSymbol = new withSymbol(....). And this differ-only-by-case pattern is common, so maybe only a B+.

@@ -1,9 +1,9 @@
tests/cases/conformance/parser/ecmascript5/Statements/BreakStatements/parser_breakInIterationOrSwitchStatement4.ts(1,15): error TS2304: Cannot find name 'something'.
tests/cases/conformance/parser/ecmascript5/Statements/BreakStatements/parser_breakInIterationOrSwitchStatement4.ts(1,15): error TS2552: Cannot find name 'something'. Did you mean 'String'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😿

@@ -16,64 +16,64 @@ tests/cases/conformance/parser/ecmascript5/RealWorld/parserharness.ts(691,50): e
tests/cases/conformance/parser/ecmascript5/RealWorld/parserharness.ts(716,47): error TS2503: Cannot find namespace 'TypeScript'.
tests/cases/conformance/parser/ecmascript5/RealWorld/parserharness.ts(721,62): error TS2304: Cannot find name 'ITextWriter'.
tests/cases/conformance/parser/ecmascript5/RealWorld/parserharness.ts(724,29): error TS2304: Cannot find name 'ITextWriter'.
tests/cases/conformance/parser/ecmascript5/RealWorld/parserharness.ts(754,53): error TS2304: Cannot find name 'TypeScript'.
tests/cases/conformance/parser/ecmascript5/RealWorld/parserharness.ts(754,53): error TS2552: Cannot find name 'TypeScript'. Did you mean 'TypeScriptLS'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should issue 5-10 of these max... there's no way we find 100 spelling errors in a program

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -17,7 +17,7 @@ tests/cases/compiler/propertyOrdering.ts(20,14): error TS2339: Property '_store'

public bar() { return this.store; } // should be an error
~~~~~
!!! error TS2339: Property 'store' does not exist on type 'Foo'.
!!! error TS2551: Property 'store' does not exist on type 'Foo'. Did you mean '_store'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A+++

@@ -13,7 +13,7 @@ tests/cases/conformance/classes/indexMemberDeclarations/privateIndexer2.ts(4,32)
~
!!! error TS1005: ']' expected.
~~~~~~
!!! error TS2304: Cannot find name 'string'.
!!! error TS2552: Cannot find name 'string'. Did you mean 'String'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we need to ban some global identifiers but not others. The boxed primitives, null, and NaN in particular... probably the rest can stay?

@@ -1,6 +1,6 @@
tests/cases/compiler/propertySignatures.ts(2,13): error TS2300: Duplicate identifier 'a'.
tests/cases/compiler/propertySignatures.ts(2,23): error TS2300: Duplicate identifier 'a'.
tests/cases/compiler/propertySignatures.ts(14,12): error TS2304: Cannot find name 'foo'.
tests/cases/compiler/propertySignatures.ts(14,12): error TS2552: Cannot find name 'foo'. Did you mean 'foo1'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A+++

? c - CharacterCodes.A + 10
~~~~~~~~~~~~~~
!!! error TS2304: Cannot find name 'CharacterCodes'.
!!! error TS2552: Cannot find name 'CharacterCodes'. Did you mean 'CharacterInfo'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C+

@RyanCavanaugh
Copy link
Member

Left some subjective comments

My overall impression is a case-insensitive edit distance of 2 (if the string is long enough - use a lower threshold for shorter strings) is probably where we should stop. The value-add here comes when you "can't see" the spelling error (fileName / filename, getColisions / getCollisions, setLength / setLenth, etc) and changes farther than 2 are much more likely to be "seeable".

Previously, you would get the generic message when writing incorrect
code like `let y = number`. "Cannot find name 'number'". Now the message
says "'number' is a type but is used a value here."

Fixes #15565
The maximum distance cutoff was being checked after the close-enough
early exit. Now it's checked before.

Note that `null` doesn't show up in the globals list, so it's not part
of the blacklist either.
It should be all lowercase since candidates have been lowercased by the
point the blacklist is checked.
@sandersn
Copy link
Member Author

sandersn commented May 3, 2017

All right, the suggestions are now much higher precision. Want to take another look?

I decided to fix #15565, too, instead of blacklisting String et al outright. Imagine all the people out there haplessly writing Sting and Nubmer.

@sandersn
Copy link
Member Author

sandersn commented May 8, 2017

Now that the spelling suggestions are in order, can I get a review of the implementation?

@sandersn sandersn merged commit 5143a3c into master May 9, 2017
@@ -4641,4 +4641,27 @@ namespace ts {
export function unescapeIdentifier(identifier: string): string {
return identifier.length >= 3 && identifier.charCodeAt(0) === CharacterCodes._ && identifier.charCodeAt(1) === CharacterCodes._ && identifier.charCodeAt(2) === CharacterCodes._ ? identifier.substr(1) : identifier;
}

export function levenshtein(s1: string, s2: string): number {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend for levenshtein to be in the public TypeScript API? This file has two top-level ts namespaces, the first marked with /* @internal */ and the second without the marker. This code is in the second namespace currently.

@sandersn sandersn deleted the spelling-correction branch May 11, 2017 20:13
@sandersn
Copy link
Member Author

No, I didn't mean for it to be part of the API. I forgot about the public/private halves of the file. I'll move it to the private side.

@microsoft microsoft locked and limited conversation to collaborators Jun 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants