Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DELET should be DELETE in "WHEN a user DELETEs /[...]" #745

Open
awoimbee opened this issue Jun 13, 2023 · 14 comments
Open

DELET should be DELETE in "WHEN a user DELETEs /[...]" #745

awoimbee opened this issue Jun 13, 2023 · 14 comments
Labels
enhancement Improve the expected

Comments

@awoimbee
Copy link

Hi,
I have comments on tests that look like:

GIVEN the sky is blue
WHEN a user DELETEs /something
THEN something happens

typos seems to accept POSTs, HEADs, GETs, but not DELETEs (nor OPTIONs, but doesn't make much sense so it's OK):

error: `DELET` should be `DELETE`
  --> ./[...]
    |
405 |     WHEN a user DELETEs /[...]
    |                 ^^^^^
    |
error: `OPTIO` should be `OPTION`
  --> ./[...]
    |
406 |     WHEN a user OPTIONs /[...]
    |                 ^^^^^
    |

When I run typos -w, I end up with: DELETEEs and OPTIONNs.

Maybe typo thinks I'm writing a new camelCase word instead of pluralizing ?

@epage
Copy link
Collaborator

epage commented Jun 13, 2023

typos seems to accept POSTs, HEADs, GETs, but not DELETEs (nor OPTIONs, but doesn't make much sense so it's OK):

...

Maybe typo thinks I'm writing a new camelCase word instead of pluralizing ?

Yes it is thinking you are using camelCase.

If you run typo --words, you'll see that these are POS Ts, HEA Ds, GE Ts, DELET Es, and OPTIO Ns. The only reason they are "accepted" is POS, HEA, and GE` aren't in our list of typos to correct.

@awoimbee
Copy link
Author

Thanks !
Then the question becomes:

  • Do we want to add a rule to accept any uppercase words with a lowercase 's' ?
  • Or do we add special rules for some words (GET, POST, ...) ?
  • Or do we prefer to keep these as exceptions that users can configure via extend-ignore-identifiers-re ?

@epage epage added the enhancement Improve the expected label Jun 14, 2023
@epage
Copy link
Collaborator

epage commented Jun 14, 2023

Or do we prefer to keep these as exceptions that users can configure via extend-ignore-identifiers-re ?

You can also ignore them by adding them to extend-identifiers and having them correct to themselves.

I'll leave this open for a bit to see how much general interest there is for this but I would lean towards extend-identifiers for now

@szepeviktor
Copy link
Contributor

I meet with this on a daily basis.
DELETE-s is my suggestion.

@awoimbee
Copy link
Author

awoimbee commented Jan 10, 2024

I you have the same issue please add a thumbs up to let the maintainer know how many people are affected.
In the meantime you should create a .typos.toml file with:

[default]
extend-ignore-identifiers-re = [
    "DELETEs" # https://github.com/crate-ci/typos/issues/745
]

@mkatychev
Copy link

mkatychev commented Jan 31, 2024

Reposting from duplicate #916:

1  D COULDa
2  D COULDd
3  D COULDs
4  D SHOULDs
5  I LOCIs
6  I OCTOPId
7  L AT_WILLs
8  L WILLs
9  T BROUGHTd
10 T BROUGHTs
11 T MUSTs
12 T WITs
13 T SHOULDNTs

The README.md above produces the output below:

README.md:1:6: `COUL` -> `COULD`
README.md:2:6: `COUL` -> `COULD`
README.md:3:6: `COUL` -> `COULD`
README.md:4:6: `SHOUL` -> `SHOULD`, `SHAWL`, `SHOAL`
README.md:7:9: `WIL` -> `WILL`, `WELL`
README.md:8:6: `WIL` -> `WILL`, `WELL`

Cases 5,6 and 9-12 seem to be handled correctly.

It seems that a lowercase letter that follows an all uppercase word not ending in T or I is split incorrectly.

SHOULDNTs behaves as expected and SHOULDs does not, I wonder if that's a clue 🤔 .

EDIT:
So far the regex below seems to swallow/mute the discrepancy:

extend-ignore-re = ["[A-Z]+[^TFI][a-z]"]

@epage
Copy link
Collaborator

epage commented Jan 31, 2024

It seems that a lowercase letter that follows an all uppercase word not ending in T or I is split incorrectly.

SHOULDNTs behaves as expected and SHOULDs does not, I wonder if that's a clue 🤔 .

Typos word splitter is strict. If it sees a change in case, it will treat it as a word change. You can see this in action by running with --words. Using the word list you gave, I get

D
COUL
Da
D
COUL
Dd
D
COUL
Ds
D
SHOUL
Ds
I
LOC
Is
I
OCTOP
Id
L
AT
WIL
Ls
L
WIL
Ls
T
BROUGH
Td
T
BROUGH
Ts
T
MUS
Ts
T
WI
Ts
T
SHOULDN
Ts

Whether it reports a typo because of that is dependent on what is in our list of typos. We don't have an approved list of words that have to be matched but a deny list of misspellings and how they can be corrected.

@mkatychev
Copy link

mkatychev commented Jan 31, 2024

That's handy insight, I think the issue with the output above is that a boundary between words is being treated as a word itself.
Any match on this regex [A-Z]+[a-z] is going to produce a "false" PascalCase word using the last upper and the following lower, does that feel overly optimistic?

@epage
Copy link
Collaborator

epage commented Jan 31, 2024

As a human, its easy to look at these cases and recognize where the word split is. When encoding this, its much more difficult and either direction you go, you'll get things wrong.

@mkatychev
Copy link

mkatychev commented Jan 31, 2024

Fully agree, by no means am I suggesting a solution, just engaging in ascription. Describing the problem is helpful (for myself and possibly others) in understanding it.

Two character words seem to be a particularly tricky thing since it touches on other issues (like hexadecminal representation).

@dboehmer
Copy link

dboehmer commented Feb 7, 2024

I’m relieved I’m not the first person to have this issue but I’m surprised nobody mentioned SQL so far. Many people write SQL keywords all uppercase and I use them as verbs, too. typos doesn’t understand this:

$ echo "A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table." | ./typos -
error: `INSER` should be `INSERT`
  --> -:1:46
  |
1 | A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table.
  |                                              ^^^^^
  |
error: `UPDAT` should be `UPDATE`
  --> -:1:60
  |
1 | A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table.
  |                                                            ^^^^^
  |
error: `DELET` should be `DELETE`
  --> -:1:69
  |
1 | A user CREATEs, ALTERs, SELECTs from, JOINs, INSERTs into, UPDATEs, DELETEs from or DROPs a table.
  |                                                                     ^^^^^
  |
$ ./typos --version
typos-cli 1.18.1

The tool also wanted to correct UNIQUEs but you may argue that this is a weird slang for UNIQUE constraints. To allow all major SQL verbs which are not allowed by default:

[default]
extend-ignore-identifiers-re = [
    "\\bDELETEs\\b",
    "\\bINSERTs\\b",
    "\\bUPDATEs\\b",
]

Update: added word boundary markers. Thanks @szepeviktor!

@szepeviktor
Copy link
Contributor

szepeviktor commented Feb 7, 2024

Adding word boundaries may help.

    "\\bDELETEs\\b",

benegee added a commit to trixi-framework/libtrixi that referenced this issue Apr 15, 2024
sloede pushed a commit to trixi-framework/libtrixi that referenced this issue Apr 15, 2024
* add typos ignore file

crate-ci/typos#745 (comment)

* move config file

* delete file

* file path relative to /

* move to /

* move file

* moved?!
@odinho
Copy link

odinho commented Jun 7, 2024

Ah I finally found this before trying to make a new issue. I have this as an example:

error: `CANDIDAT` should be `CANDIDATE`
  --> src/somefile.ts:278:71
    |
278 | * Refuse to discuss anything other than the POSITION and the CANDIDATEs skills and experience.
    |                                                              ^^^^^^^^
    |

Is there a way to generically handle the case 'ALLCAPSs' and treat and check it with the valid-word-list? In other words; mark out (with a regex) what will be checked? I was looking at default.extend-ignore-words-re etc, but it seems what I'd want is something that transformed the word, then did the typos check on that one?

I could of course just start having a big manual list, but I know the pattern of these ALL_CAPS_IDENTIFIER + 's'.

@epage
Copy link
Collaborator

epage commented Jun 7, 2024

There is not a way to normalize custom casing and then check that.

If this is something you do frequently and there isn't other punctuation, like backticks in markdown, that you want to use, it would likely be best to use default.extend-ignore-words-re and ignore them and risk a typo than have it complain when there is nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve the expected
Projects
None yet
Development

No branches or pull requests

6 participants