Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the default block-list #19

Open
JakeSummers opened this issue Jan 5, 2024 · 1 comment
Open

Expand the default block-list #19

JakeSummers opened this issue Jan 5, 2024 · 1 comment

Comments

@JakeSummers
Copy link

Good Morning!

This is a pretty nifty package. I would be interested in starting to use it.

One current limitation of this tool is that the default block-list is pretty limited:

parser.add_argument('--blocklist', help='Comma separated list of words '
'to lint in any context, with possibly special '
'characters between, case insensitive; '
'DEFAULT to master,slave,whitelist,blacklist')

This tool would be significantly more useful if it came packaged with a more extensive block-list. Right now, I need to make the block-list and get it code-reviewed (which I anticipate will be difficult).

In the readme, this alexjs is cited as inspiration:

This project is inspired by [Alex.js](https://alexjs.com).

I did a quick look and it seems like alexjs comes with a very comprehensive block-list via the retext-equality npm package. The full block-list is here: https://github.com/retextjs/retext-equality/tree/main/data/en

They also provide acceptable alternatives (with sources :) ) so that you can create output like this:

example.md
   1:5-1:14  warning  `boogeyman` may be insensitive, use `boogeymonster` instead                boogeyman-boogeywoman  retext-equality
  1:42-1:48  warning  `master` / `slaves` may be insensitive, use `primary` / `replica` instead  master-slave           retext-equality
  1:69-1:75  warning  Don’t use `slaves`, it’s profane                                           slaves                 retext-profanities
  2:52-2:54  warning  `he` may be insensitive, use `they`, `it` instead                          he-she                 retext-equality
  2:61-2:68  warning  `cripple` may be insensitive, use `person with a limp` instead             gimp                   retext-equality

⚠ 5 warnings

Source

It would be awesome if we could do the following:

  1. Copy the data from https://github.com/retextjs/retext-equality/tree/main/data/en into this repo
  2. Use that as the default block-list
  3. Add support for suggesting alternatives.
@JakeSummers JakeSummers changed the title Expand the default block-list so that users do not need to define the block-list themselves Expand the default block-list Jan 5, 2024
@troycomi
Copy link
Collaborator

troycomi commented Jan 5, 2024

It's a good point but I have a few reservations. I almost purposefully made this unopinionated so others could customize as needed. Adding an alternative may be within scope, though a larger change. Here are my concerns with the full alexjs list:

  • Many examples are field specific. E.g. islamists probably won't come up in most software development.
  • Many examples are multiple words/phrases and wouldn't directly translate.
  • Some are still valid in the context of development (primative, bugreport).

Overall I think including all the inconsiderate words would add bloat for checking source code specifically. Someone who uses slurs in their code probably won't care if this tool complains. But legacy usage of something like blacklist or master is what I wanted to mostly catch. For markdown, I'd also run alexjs to catch offensive phrases and language.

Here's what I'd propose.

  1. make a blocklint config file that includes all single-word entries in the alexjs database
  2. check if linting with that list drastically increases runtime (it may, my regexes are fairly complex)
  3. add a --strict switch which will use the strict config file

So users have the option to specify the strict switch or copy the file from github and modify as they see fit. I'd be open to a PR for adding a reason, but that would require a lot of rewrites.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants