Rule-based models are often used for data analysis as they combine interpretability with predictive power. We present RuleKit 2, a versatile tool for rule learning. Based on a sequential covering induction algorithm, it is suitable for classification, regression, and survival problems. The presence of user-guided induction mode facilitates verifying hypotheses concerning data dependencies which are expected or of interest. The powerful and flexible experimental environment allows straightforward investigation of different induction schemes. Unlike the first revision, RuleKit 2 does not depend on RapidMiner. The analysis can be performed in batch mode and through Python package. A documented Java API is also provided for convenience. Running RuleKit as a RapidMiner plugin and R package is no longer supported in version 2.
RuleKit provides latest versions of our algorithms (some of them were initially published as independent packages and integrated later):
- LR-Rules (Wróbel et al, 2017) - survival rules induction,
- GuideR (Sikora et al, 2019) - user-guided induction.
- RuleKit-CS (Gudyś et al, 2024) - contrast set mining.
In the following subsections we provide a brief introduction on how to install and use RuleKit batch interface. The software requires Java Development Kit in version 8 (version 1.8.0 tested) to work properly. In Windows one can download the installer from Oracle webpage. In Linux, a system package manager should be used instead. For instance, in Ubuntu 16.04 execute the following command:
sudo apt-get install default-jdk
In order to use batch mode, please download rulekit-<version>-all.jar file from the releases folder. Alternatively, one can build the package from the sources by running the following commands in the adaa.analytics.rules directory of this repository. Windows:
gradlew -b build.gradle rjar
Linux:
./gradlew -b build.gradle rjar
The JAR file will be placed in adaa.analytics.rules/build/libs subdirectory. Once the package has been downloaded/built, the analysis can be performed. The example batch experiment concerns the problem of classifying whether a person making a purchase will be a future customer. The corresponding dataset is named deals and is split into train and test parts (download). To run the experiment, copy RuleKit JAR file into ./examples folder of the repository and execute:
java -jar rulekit-<version>-all.jar minimal-deals.xml
Ignore the SLF4J warning reported on the console - it does not affect the procedure. The results of the analysis will be located in ./examples/results-minimal/deals/ folder. Note, that the repository already contains reference results - they will be overwritten. See this Wiki section for detailed information on how to configure batch analyses in RuleKit.
Rulekit Python package can be found here
The detailed RuleKit documentation can be found on Wiki pages which cover the following topics:
- Batch interface
- RapidMiner plugin
- R package
- Quality and evaluation
- Output files
- User-guided induction
- Library API
- Empirical results
- Contrast set mining
JavaDoc for the project is available here.
The repository contains the datasets used in the GuideR study. We also provide the latest UCI revision of the Bone marrow transplant: children dataset. We recommend using this dataset at it contains lots of improvements compared to the previous release (e.g., textual encoding of attribute values).
RuleKit Development Team:
- Adam Gudyś
- Łukasz Wróbel
- Marek Sikora
Contributors:
- Wojciech Górka
- Joanna Henzel
- Paweł Matyszok
- Wojciech Sikora
The software is publicly available under GNU AGPL-3.0 license.
The software is publicly available under GNU AGPL-3.0 license. Any derivative work obtained under this license must be licensed under the AGPL if this derivative work is distributed to a third party. For commercial projects that require the ability to distribute RuleKit code as part of a program that cannot be distributed under the AGPL, it may be possible to obtain an appropriate license from the authors.