Integrating R machine learning algorithms in Stata using rcall 3.0: a tutorial for Stata users and developers

Cite: Haghish, E. F. (in preparation). Integrating R machine learning algorithms in Stata using rcall 3.0: a tutorial for Stata users and developers

If you interested in reading the source code of this package, contributing to this project, or develop your own R-based Stata packages, I suggest you to read the following articles. These articles provide the background to embed R code in Stata, write easy-to-read and elegant software documentation, and host your packages on GitHub.

Haghish, E. F. (2019). Seamless interactive language interfacing between R and Stata.
The Stata Journal. 2019;19(1):61-82.
Haghish, E. F. (2020). Software documentation with markdoc 5.0.
The Stata Journal. 2020;20(2):336-362.
Haghish, E. F. (2020). Developing, maintaining, and hosting Stata statistical software on GitHub.
The Stata Journal. 2020;20(4):931-951.

What is this repository about?

The title of the repository reads machne learning, but this repository is much more than just a machine learning repository. It's a place for learning advanced - but simple - Stata programming using rcall and markdoc. rcall is the main engine of the package, used for embedding R machine learning packages in Stata programs. markdoc plays a central role in software documentation... [to be continued]

How to get started?

Each program is written in a seperate ADO file, named after the program.
The Markdown documentation of each program is written within the script file.
The make.do file is the package generator file. It's a program installed with github package
The make.do also generates Stata help files, Markdown help files (for GitHub Wiki), and PDF package Vignette
All of these documents are generated with MarkDoc literate programming package
You will find the package vignette template in vignette.do file

Description

The machinelearning package is a Stata module including several R machine learning (ML) algorithms, implemented in Stata using rcall package. The reason for developing this package is twofold:

Bringing several machine learning R packages to Stata and making them available to the community
Provide a simplistic tutorial and a real-world example showing how to
- embed intricate R code into Stata Ado programs
- document Stata Ado programs with Markdown language using MarkDoc literate programming package
- build Stata package using MarkDoc

Installation

The github package is the only recommended way for installing machinelearning. Once github is installed, you can install the development version of the package as follows. Currently, the package is work-in-progress and there is no stable release yet.

github install haghish/machinelearning

Programs

`missforest`: Missing data imputation with Random Forest

missforst embeds the missForest R package in Stata. This is a very simple - yet very powerful - missing data imputation that can provide unbised Out Of Bag (OOB) error estimation for each variable. Load your dataset in Stata and call the command! The imputed data will be loaded automatically in Stata once the imputaion is done. continue reading on GitHub Wiki ...

`kNN`: Missing data imputation with kNN algorithm

Missing values are inseperable parts of large datasets. However, the larger the data, daily imputation methods such as multiple imputation and even faster variations, such as Random Forest imputation is less feasible or infeasible. This is when faster algorithms shine, especially kNN. This package includes a Stata program that implements the kNN algorithm using using rcall. continue reading on GitHub Wiki ...

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Presentations/Stata_London_2021		Presentations/Stata_London_2021
.DS_Store		.DS_Store
.RData		.RData
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dependency.do		dependency.do
diagram.png		diagram.png
knn.ado		knn.ado
knn.sthlp		knn.sthlp
machinelearning.pkg		machinelearning.pkg
make.do		make.do
missforest.ado		missforest.ado
missforest.sthlp		missforest.sthlp
stata.toc		stata.toc
vignette.do		vignette.do
vignette.pdf		vignette.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integrating R machine learning algorithms in Stata using rcall 3.0: a tutorial for Stata users and developers

What is this repository about?

How to get started?

Description

Installation

Programs

`missforest`: Missing data imputation with Random Forest

`kNN`: Missing data imputation with kNN algorithm

About

Releases

Packages

Languages

License

haghish/machinelearning

Folders and files

Latest commit

History

Repository files navigation

Integrating R machine learning algorithms in Stata using rcall 3.0: a tutorial for Stata users and developers

What is this repository about?

How to get started?

Description

Installation

Programs

missforest: Missing data imputation with Random Forest

kNN: Missing data imputation with kNN algorithm

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`missforest`: Missing data imputation with Random Forest

`kNN`: Missing data imputation with kNN algorithm

Packages