Integrating R machine learning algorithms in Stata using rcall 3.0: a tutorial for Stata users and developers
Cite: Haghish, E. F. (in preparation). Integrating R machine learning algorithms in Stata using rcall 3.0: a tutorial for Stata users and developers
If you interested in reading the source code of this package, contributing to this project, or develop your own R-based Stata packages, I suggest you to read the following articles. These articles provide the background to embed R code in Stata, write easy-to-read and elegant software documentation, and host your packages on GitHub.
-
Haghish, E. F. (2019). Seamless interactive language interfacing between R and Stata.
The Stata Journal. 2019;19(1):61-82. -
Haghish, E. F. (2020). Software documentation with markdoc 5.0.
The Stata Journal. 2020;20(2):336-362. -
Haghish, E. F. (2020). Developing, maintaining, and hosting Stata statistical software on GitHub.
The Stata Journal. 2020;20(4):931-951.
The title of the repository reads machne learning, but this repository is much more than just a machine learning repository. It's a place for learning advanced - but simple - Stata programming using rcall and markdoc. rcall is the main engine of the package, used for embedding R machine learning packages in Stata programs. markdoc plays a central role in software documentation... [to be continued]
- Each program is written in a seperate ADO file, named after the program.
- The Markdown documentation of each program is written within the script file.
- The make.do file is the package generator file. It's a program installed with
github package
- The make.do also generates Stata help files, Markdown help files (for GitHub Wiki), and PDF package Vignette
- All of these documents are generated with
MarkDoc
literate programming package - You will find the package vignette template in vignette.do file
The machinelearning
package is a Stata module including several R machine learning (ML) algorithms, implemented in
Stata using rcall
package. The reason for developing this package is twofold:
- Bringing several machine learning R packages to Stata and making them available to the community
- Provide a simplistic tutorial and a real-world example showing how to
The github package
is the only recommended way for installing machinelearning
. Once github
is installed, you can install the development version of the package as follows. Currently, the package is work-in-progress
and there is no stable release yet.
github install haghish/machinelearning
missforst
embeds the missForest R package in Stata.
This is a very simple - yet very powerful - missing data imputation that can provide unbised Out Of Bag (OOB)
error estimation for each variable. Load your dataset in Stata and call the command! The imputed data will
be loaded automatically in Stata once the imputaion is done.
continue reading on GitHub Wiki ...
Missing values are inseperable parts of large datasets. However, the larger the data, daily imputation methods
such as multiple imputation and even faster variations, such as Random Forest imputation is less feasible or
infeasible. This is when faster algorithms shine, especially kNN. This package includes a Stata program
that implements the kNN algorithm using using rcall
. continue reading on GitHub Wiki ...