Rename to AcrosticSleuth

acrostics · Jul 29, 2024 · 7b04914 · 7b04914
1 parent b89b837
commit 7b04914
Show file tree

Hide file tree

Showing 2 changed files with 24 additions and 24 deletions.
diff --git a/README.md b/README.md
@@ -1,47 +1,47 @@
-# AcrosticScout
+# AcrosticSleuth
 
-AcrosticScout is a program for identifying and ranking acrostics. 
+AcrosticSleuth is a program for identifying and ranking acrostics. 
 At a high level, the tool works by comparing the probability of random occurrence with the probability that a sequence of characters forms a meaningful word or phrase in the target language.
-AcrosticScout is optimized to quickly process gigabytes of text. 
-With the help of AcrosticScout, we have been able to discover multiple previously unknown acrostics, including the English philosopher's Thomas Hobbes signature in *The Elements of Law* (THOMAS[OF]HOBBES).
+AcrosticSleuth is optimized to quickly process gigabytes of text. 
+With the help of AcrosticSleuth, we have been able to discover multiple previously unknown acrostics, including the English philosopher's Thomas Hobbes signature in *The Elements of Law* (THOMAS[OF]HOBBES).
 You can read more about the methodology in our upcoming paper ([preprint]()).
 
 ### Table of contents
-- [What languages does AcrosticScout support?](#what-languages-does-acrosticscout-support)
-- [How to install and use AcrosticScout?](#how-to-install-and-use-acrosticscout)
+- [What languages does AcrosticSleuth support?](#what-languages-does-acrosticsleuth-support)
+- [How to install and use AcrosticSleuth?](#how-to-install-and-use-acrosticsleuth)
 - [Hello World example](#hello-world-example)
-- [How was AcrosticScout evaluated?](#how-was-acrosticscout-evaluated)
+- [How was AcrosticSleuth evaluated?](#how-was-acrosticsleuth-evaluated)
 - [How to reproduce our results?](#how-to-reproduce-our-results)
 - [How to cite this?](#how-to-cite-this)
 
-## What languages does AcrosticScout support?
-AcrosticScout currently support **English, French, Russian, and Latin**. 
-The only language-specific component of AcrosticScout is the unigram language model produced by [sentencepiece](https://github.com/google/sentencepiece).
-Support for new languages can, therefore, be easily added -- please [make an issue](https://github.com/acrostics/acrostic-scout/issues/new) here on GitHub if you wish to use AcrosticScout with another language. 
+## What languages does AcrosticSleuth support?
+AcrosticSleuth currently support **English, French, Russian, and Latin**. 
+The only language-specific component of AcrosticSleuth is the unigram language model produced by [sentencepiece](https://github.com/google/sentencepiece).
+Support for new languages can, therefore, be easily added -- please [make an issue](https://github.com/acrostics/acrostic-sleuth/issues/new) here on GitHub if you wish to use AcrosticSleuth with another language. 
 
-## How to install and use AcrosticScout?
+## How to install and use AcrosticSleuth?
 
-To run AcrosticScout, you need Java SDK installed on your machine.
-We have tested AcrosticScout on Mac OS and Linux.
+To run AcrosticSleuth, you need Java SDK installed on your machine.
+We have tested AcrosticSleuth on Mac, Mac-Arm, Ubuntu, and Windows [as part of our CI](.github/workflows/main.yml).
 
 First, compile the code from the base directory using:
 
 ```bash
 javac -cp src -encoding UTF-8 src/acrostics/*.java
 ```
 
-Then run AcrosticScout using the command below, replacing `INPUT` and `LANG` with the name of the directory that contains the dataset you wish AcrosticScout to analyze and the language of that dataset, respectively:
+Then run AcrosticSleuth using the command below, replacing `INPUT` and `LANG` with the name of the directory that contains the dataset you wish AcrosticSleuth to analyze and the language of that dataset, respectively:
 
 ```bash
 java -cp src acrostics.Main -input INPUT -language LANG
 ```
 
-AcrosticScout accepts multiple optional command line arguments (thank you, [picocli](https://github.com/remkop/picocli/tree/v4.7.6)) -- run the tool with the `--help` flag to get the up-to-date list of all available options.
+AcrosticSleuth accepts multiple optional command line arguments (thank you, [picocli](https://github.com/remkop/picocli/tree/v4.7.6)) -- run the tool with the `--help` flag to get the up-to-date list of all available options.
 
 ## Hello World example
 
-This repository includes an example dataset comprising a subset of pages with acrostics from the English subdomain of WikiSource database (see [How was AcrosticScout evaluated?](#how-was-acrosticscout-evaluated)). 
-You can test AcrosticScout on this small dataset using:
+This repository includes an example dataset comprising a subset of pages with acrostics from the English subdomain of WikiSource database (see [How was AcrosticSleuth evaluated?](#how-was-acrosticsleuth-evaluated)). 
+You can test AcrosticSleuth on this small dataset using:
 
 ```bash
 java -cp src acrostics.Main -input data/example -language EN -mode LINE -charset utf-8 -outputSize 4000 --concise
@@ -52,7 +52,7 @@ Here is the meaning behind each of the options used:
 - `-language EN`: use the default English language model
 - `-mode LINE`: search for line acrostics (where an acrostic is formed by the initial letters of each line)
 - `-charset utf-8`: use the utf-8 encoding when opening the files
-- `-outputSize 4000`: return top 4000 instances (AcrosticScout clusters collocated instances, so the actual number of results it returns is much smaller -- 46)
+- `-outputSize 4000`: return top 4000 instances (AcrosticSleuth clusters collocated instances, so the actual number of results it returns is much smaller -- 46)
 - `--concise`: only report key information (file,acrostic,rank).
 
 Specifically, you should be getting the following output (highest ranked acrostics appear at the bottom of the list):
@@ -108,10 +108,10 @@ data/example/The PearlVolume 18Acrostic.txt     cunt_is_sweet_when_young_and_ten
 data/example/The Confessions of William-Henry Ireland.txt       warwick_at_dudley_at_southampton_at_rivers_at_shakspeare        7.6181055E+27
 ```
 
-## How was AcrosticScout evaluated?
+## How was AcrosticSleuth evaluated?
 
 We have created the [Acrostic Identification Task Dataset](https://github.com/acrostics/acrostic-identification-task-dataset) by manually identifying all poems explicitly referred to or formatted as acrostics on English, Russian, and French subdomains of [WikiSource](https://en.wikisource.org/wiki/Main_Page), an online library of source texts in the public domain.
-AcrosticScout reaches recall of over 50% within the first 100 results it returns for English and Russian, and recall rises to up to 80% when considering more results.
+AcrosticSleuth reaches recall of over 50% within the first 100 results it returns for English and Russian, and recall rises to up to 80% when considering more results.
 Read more in our [paper]():
 
 ![](RecallFigure.svg)
@@ -131,9 +131,9 @@ First, clone this directory with the `--recursive` flag, so that it also include
 Next, follow the directions for [downloading and setting up the Acrostic Identification Task Dataset](https://github.com/acrostics/acrostic-identification-task-dataset/blob/main/README.md), which is cloned as a submodule for this repository in the `data` directory.
 Make sure to run the [get_data.sh](https://github.com/acrostics/acrostic-identification-task-dataset/blob/main/get_data.sh) script as discussed in the README linked above.
 
-Finally, to run AcrosticScout on the dataset and measure its recall, run [data/evaluate_on_acrostics-identification-task-dataset.sh](data/evaluate_on_acrostics-identification-task-dataset.sh). 
+Finally, to run AcrosticSleuth on the dataset and measure its recall, run [data/evaluate_on_acrostics-identification-task-dataset.sh](data/evaluate_on_acrostics-identification-task-dataset.sh). 
 The script will save the output files in the `output` directory and produce `recall.png` figure that plots the recall graph you see above and in the paper. 
 
 ## How to cite this?
 
-Fedchin, A., Cooperman, I., Chaudhuri, P., Dexter, J.P. 2024 "AcrosticScout: Differentiating True Acrostics from Random Noise in Multilingual Corpora Using Probabilistic Ranking". Forthcoming
+Fedchin, A., Cooperman, I., Chaudhuri, P., Dexter, J.P. 2024 "AcrosticSleuth: Differentiating True Acrostics from Random Noise in Multilingual Corpora Using Probabilistic Ranking". Forthcoming
diff --git a/data/acrostic-identification-task-dataset b/data/acrostic-identification-task-dataset