Skip to content

Commit

Permalink
chore: update README and docs
Browse files Browse the repository at this point in the history
  • Loading branch information
double-beep authored Aug 28, 2024
1 parent 91e37a3 commit 22d7508
Show file tree
Hide file tree
Showing 8 changed files with 66 additions and 63 deletions.
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,23 @@

## Background

This bot has been developed in an attempt to help capture possible vandalism. This includes:

- Removing all code
- Replacing all content with nonsense
- Replacing all content with repeated words
- Adding solutions to their questions instead of posting an answer
- Removing large amounts of text from their post
- Using certain keywords or offensive language within the edit summary
This bot has been developed in an attempt to help capture possible vandalism by identifying edits that:

- remove all code
- replace content with nonsense or repeated words
- include solutions to questions
- remove large amounts of text from the post
- use certain keywords or offensive language within the edit summary

## Why do we need the bot?

The point of the bot is to help identify bad edits and/or potential vandalism made to posts in real time so that the changes can be quickly rolled back.

## Implementation

The bot queries the [Stack Exchange API][1] once every minute to get a list of the latest posts. There is logic to check that the post has been edited and that it has been edited by the author.
The bot queries the [Stack Exchange API][1] every minute to fetch a list of the most recently edited posts. There is logic to check that the post has been edited and that it has been edited by the author.

The `post_id` from each post is then taken and the [Stack Exchange API][2] is again queried for the list of revisions. To limit calls we utilise the functionality of pushing multiple ids into the API and then logic is in place to ensure we are using the latest revision.
The `post_id` from each post is then extracted and the [Stack Exchange API][2] is again queried for a list of revisions. To reduce API calls multiple ids are sent at once, and then logic is in place to ensure we are using the latest revision.

Edits can be made up of a title change, body change of a question, tag changes or changes made to the body of an answer. Currently tags are not checked. Instead the title, question body and answer body depending on what has been edited are run through filters, as is the edit summary.

Expand All @@ -33,17 +32,17 @@ Edits can be made up of a title change, body change of a question, tag changes o

### The question/answer body is run through the following filters:

- `TextRemoved`; 80% or more of the body must have been removed and then it must have a [Jaro Winkler][3] score of less than 0.6
- `TextRemoved`; the bot checks if 80% or more of the body has been removed and whether the [Jaro Winkler][3] score of the diff is less than 0.6.
- `BlacklistedWords`; certain words are appended to posts. The bot reads a separate file for questions and answers. Both hold a list of keywords to watch for
- `CodeRemoved`; the bot watches for all code being removed
- `FewUniqueCharacters`; the body must either be 30 plus characters long and have less than 7 unique characters or be 100 plus characters long and have less than 16 unique characters
- `RepeatedWords`; this is when an edit is made were all the body is replaced with repeated words. The bot will output if 5 or less unique words are found
- `VeryLongWord`; the bot checks the post for a word longer than 50 characters long. Code is removed before the check is done
- `CodeRemoved`; the bot checks if the latest edit removed all code from the post.
- `FewUniqueCharacters`; the bot checks if the post contains few unique characters — this rule is similar to [SmokeDetector's "Few unique characters" one](https://metasmoke.erwaysoftware.com/reason/23).
- `RepeatedWords`; the bot checks whether there are 5 or less unique words in the post.
- `VeryLongWord`; the bot checks the post for a word longer than 50 characters long. Code blocks are stripped before the check is performed.

### Edit summaries are run through the following filters:

- `BlacklistedWords`; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for
- `OffensiveWord`; the bot checks for offensive language used within the edit summary. This is done via a separate regex file
- `BlacklistedWords`; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for.
- `OffensiveWord`; the bot checks for offensive language used within the edit summary. This is done via a separate regex file.

## Accounts

Expand Down Expand Up @@ -76,17 +75,19 @@ A sample image of a report is:

mvn clean install

- Fill in `properties/login.properties`.
- Start the bot:
- Run

cp properties/login.example.properties properties/login.properties

and fill `properties/login.properties`.

- Start the bot by running:

java -cp target/belisarius-1.7.1.jar:./lib/* bugs.stackoverflow.belisarius.Application

-----

If you want to change the location of the log file, edit `src/main/resources/log4j.xml` and change the path in line 16.
Please note that the project should be rebuilt (`mvn install`), for the changes to be applied.

The source code is available on [GitHub][8] and suggestions are welcome.
If you want to change the location of the log file, edit `src/main/resources/log4j.xml`. The project must be rebuilt (`mvn install`), for the changes to be applied.

[1]: https://api.stackexchange.com/docs/posts
[2]: https://api.stackexchange.com/docs/revisions-by-ids
Expand All @@ -95,4 +96,3 @@ The source code is available on [GitHub][8] and suggestions are welcome.
[5]: http://chat.stackoverflow.com/rooms/111347/sobotics
[6]: http://belisarius.sobotics.org/commands
[7]: https://user-images.githubusercontent.com/38133098/94342659-2af8d680-001b-11eb-9842-e6d0f5f4a70b.png
[8]: https://github.com/SOBotics/Belisarius
18 changes: 10 additions & 8 deletions docs/commands.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# Commands

The list of commands is as follows
The list of commands is as follows:

alive - Test to check if the bot is alive or not.
check - Checks a post, through the API, for potential vandalism (must be either a moderator or a room owner).
help - Returns the description of the bot.
quota - Returns the current quota
reboot - Stops and starts the bot (must be either a moderator or a room owner).
stop - Stops the bot (must be a either a moderator or a room owner).
commands - Returns the list of commands associated with this bot.
| Commands | Description |
|------------|-------------------------------------------------------------------------------------------------------|
| `alive` | Test to check if the bot is alive or not |
| `check` | Checks a post, through the API, for potential vandalism (must be either a moderator or a room owner). |
| `help` | Returns the description of the bot. |
| `quota` | Returns the current API quota |
| `reboot` | Restarts the bot (must be either a moderator or a room owner). |
| `stop` | Stops the bot (must be a either a moderator or a room owner). |
| `commands` | Returns the list of commands associated with this bot. |
5 changes: 2 additions & 3 deletions docs/comments.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Auto-comments

Sometimes, it's good to leave some comments in a vandalised post to help OP understand what they did wrong. Here are some you can use.
They are in the format used by the [Stack Exchange AutoReview Comments userscript](https://stackapps.com/q/2116/58907) so that you can import and use them easily.
In cases of potential vandalism, consider leaving constructive comments to help the post author understand their mistake. Below you can find a list of auto comments (in a format compatible with the [Stack Exchange AutoReview Comments userscript](https://stackapps.com/q/2116)) that you can import and use easily:

```
###[Q] Vandalism
Expand All @@ -14,4 +13,4 @@ Editing questions to improve them (e.g. adding additional information, etc.) is
Please don't make more work for others by vandalizing your posts. By posting on the Stack Exchange (SE) network, you've granted a non-revocable right, under the [CC BY-SA 4.0 license](//creativecommons.org/licenses/by-sa/4.0), for SE to distribute the content (i.e. regardless of your future choices). By SE policy, the non-vandalized version is distributed. Thus, any vandalism will be reverted. Please see: [How does deleting work?](//meta.stackexchange.com/q/5221). If permitted to delete, there's a "delete" button below the post, on the left, but it's only in browsers, not the mobile app.
```

*Important! The second and the third autocomments are taken from Makyen ([1](https://stackoverflow.com/posts/comments/113202985), [2](https://stackoverflow.com/posts/comments/113198538)).*
For the second and third auto-comment, credit goes to Makyen ([1](https://stackoverflow.com/posts/comments/113202985), [2](https://stackoverflow.com/posts/comments/113198538)).
2 changes: 1 addition & 1 deletion docs/feedback.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ There are two feedback:

### How can I send feedback?

There are two ways to do this:
There are two ways that you can send feedback:

1. Reply to the report with either `tp` or `fp`. There's a [userscript](https://github.com/SOBotics/Userscripts/blob/master/Belisarius/Belisarius_Controls.user.js) which may be helpful.
2. Go to the respective Higgs report (click the "Hippo" link) and select the type of feedback you wish to send.
Expand Down
22 changes: 10 additions & 12 deletions docs/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,22 @@ In order to find if a post has been wrongly edited, titles, bodies and edit summ

### Titles are run through the following filters:

- `BlacklistedWords`; the title matches a blacklisted regex.
- `BlacklistedWords`; certain words are appended to titles. The bot reads a file which holds a list of keywords to watch out for within titles

### The question/answer body is run through the following filters:

- `TextRemoved`; 80% or more of the body must have been removed with a [Jaro Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) score of less than 0.6
- `BlacklistedWords`; the post matches a blacklisted regex.
- `CodeRemoved`; the code has been removed with the latest edit (for questions only).
- `FewUniqueCharacters`; the body is either 30+ characters long and has less than 7 unique characters or 100+ characters long and has less than 15 unique characters.
- `RepeatedWords`; the body has been replaced with 5 or less unique words as of the latest edit
- `VeryLongWord`; there a word bigger than 50 characters in the body.
- `TextRemoved`; the bot checks if 80% or more of the body has been removed and whether the [Jaro Winkler][3] score of the diff is less than 0.6.
- `BlacklistedWords`; certain words are appended to posts. The bot reads a separate file for questions and answers. Both hold a list of keywords to watch for
- `CodeRemoved`; the bot checks if the latest edit removed all code from the post.
- `FewUniqueCharacters`; the bot checks if the post contains few unique characters — this rule is similar to [SmokeDetector's "Few unique characters" one](https://metasmoke.erwaysoftware.com/reason/23).
- `RepeatedWords`; the bot checks whether there are 5 or less unique words in the post.
- `VeryLongWord`; the bot checks the post for a word longer than 50 characters long. Code blocks are stripped before the check is performed.

### Edit summaries are run through the following filters:

- `BlacklistedWords`; the edit summary matches a blacklisted regex.
- `OffensiveWord`; the edit summary matches an offensive regex.

**Note**: In order to reduce false positives in `VeryLongWord`, `TextRemoved` and `BlacklistedWords` reasons, some HTML tags are stripped (`a`, `code`, `img`, `pre`, `blockquote`).
- `BlacklistedWords`; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for.
- `OffensiveWord`; the bot checks for offensive language used within the edit summary. This is done via a separate regex file.

### Where's the list of blacklisted and offensive words?

The bot fetches the blacklisted and offensive regexes from the database. You can find the blacklisted words CSV [here](https://github.com/SOBotics/Belisarius/blob/e5e7be6425209a2bb217275c901d0790d76a1c2f/ini/BlacklistedWords.csv) and the one with the offensive words [here](https://github.com/SOBotics/Belisarius/blob/e5e7be6425209a2bb217275c901d0790d76a1c2f/ini/OffensiveWords.csv).
The bot fetches the blacklisted and offensive regexes from the database. You can find the [blacklisted words CSV here](https://github.com/SOBotics/Belisarius/blob/master/ini/BlacklistedWords.csv) and the [offensive words CSV here](https://github.com/SOBotics/Belisarius/blob/master/ini/OffensiveWords.csv).
6 changes: 3 additions & 3 deletions docs/hippo.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

Hippo is a Higgs web dashboard for Belisarius. It is the place where all the posts the bot catches are sent. Most of its features are publicly available, however you need to sign in to send feedback.

Higgs is developed by [Rob](https://github.com/rjrudman) and hosted by Das_Geek. The GitHub repository is [here](https://github.com/SOBotics/Higgs).
Higgs is developed by [Rob](https://github.com/rjrudman) and hosted by Das_Geek. You can find [the GitHub repository here](https://github.com/SOBotics/Higgs).

Here's how a report looks like:

[![Higgs report](https://i.stack.imgur.com/ffjpv.png)](https://i.stack.imgur.com/ffjpv.png)
[![Higgs report](https://i.sstatic.net/ffjpv.png)](https://i.sstatic.net/ffjpv.png)

- The prepended string `Answer to:` is added by the code and it means that the post is an answer.
- `Answer to:` is prepended to the title if the reported post is an answer.
- Current body contains the text the latest revision had at the time of reporting.
- Similarly, last body contains the text the previous revision had at the time of reporting.
- Confidence is the score of each reason.
16 changes: 8 additions & 8 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,23 @@

## Background

This bot has been developed in an attempt to help capture possible vandalism. This includes:
This bot has been developed in an attempt to help capture possible vandalism by identifying edits that:

- Removing all code
- Replacing all content with nonsense/repeated words
- Adding solutions to their questions instead of posting an answer
- Removing large amounts of text from their post
- Using certain keywords or offensive language within the edit summary
- remove all code
- replace content with nonsense or repeated words
- include solutions to questions
- remove large amounts of text from the post
- use certain keywords or offensive language within the edit summary

## Why do we need the bot?

The point of the bot is to help identify bad edits and/or potential vandalism made to posts in real time so that the changes can be quickly rolled back.

## Implementation

The bot queries the [Stack Exchange API][1] once every minute to get a list of the latest posts. There is logic to check that the post has been edited and that it has been edited by the author.
The bot queries the [Stack Exchange API][1] every minute to fetch a list of the most recently edited posts. There is logic to check that the post has been edited and that it has been edited by the author.

The `post_id` from each post is then taken and the [Stack Exchange API][2] is again queried for the list of revisions. To limit calls we utilise the functionality of pushing multiple ids into the API and then logic is in place to ensure we are using the latest revision.
The `post_id` from each post is then extracted and the [Stack Exchange API][2] is again queried for a list of revisions. To reduce API calls multiple ids are sent at once, and then logic is in place to ensure we are using the latest revision.

Edits can be made up of a title change, body change of a question, tag changes or changes made to the body of an answer. Currently tags are not checked. Instead the title, question body and answer body depending on what has been edited are run through filters, as is the edit summary.

Expand Down
12 changes: 8 additions & 4 deletions docs/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,16 @@

mvn clean install

- Fill in `properties/login.properties`.
- Start the bot:
- Run

cp properties/login.example.properties properties/login.properties

and fill `properties/login.properties`.

- Start the bot by running:

java -cp target/belisarius-1.7.1.jar:./lib/* bugs.stackoverflow.belisarius.Application

-----

If you want to change the location of the log file, edit `src/main/resources/log4j.xml` and change the path in line 16.
Please note that the project should be rebuilt (`mvn install`), for the changes to be applied.
If you want to change the location of the log file, edit `src/main/resources/log4j.xml`. The project must be rebuilt (`mvn install`), for the changes to be applied.

0 comments on commit 22d7508

Please sign in to comment.