Skip to content

a Stata module for machine learning (ML) algorithms, implemented within R using rcall package

License

Notifications You must be signed in to change notification settings

haghish/machinelearning

Repository files navigation

Integrating R machine learning algorithms in Stata using rcall 3.0: a tutorial for Stata users and developers

Cite: Haghish, E. F. (in preparation). Integrating R machine learning algorithms in Stata using rcall 3.0: a tutorial for Stata users and developers

If you interested in reading the source code of this package, contributing to this project, or develop your own R-based Stata packages, I suggest you to read the following articles. These articles provide the background to embed R code in Stata, write easy-to-read and elegant software documentation, and host your packages on GitHub.

What is this repository about?

The title of the repository reads machne learning, but this repository is much more than just a machine learning repository. It's a place for learning advanced - but simple - Stata programming using rcall and markdoc. rcall is the main engine of the package, used for embedding R machine learning packages in Stata programs. markdoc plays a central role in software documentation... [to be continued]

How to get started?

  1. Each program is written in a seperate ADO file, named after the program.
  2. The Markdown documentation of each program is written within the script file.
  3. The make.do file is the package generator file. It's a program installed with github package
  4. The make.do also generates Stata help files, Markdown help files (for GitHub Wiki), and PDF package Vignette
  5. All of these documents are generated with MarkDoc literate programming package
  6. You will find the package vignette template in vignette.do file

Description

The machinelearning package is a Stata module including several R machine learning (ML) algorithms, implemented in Stata using rcall package. The reason for developing this package is twofold:

  • Bringing several machine learning R packages to Stata and making them available to the community
  • Provide a simplistic tutorial and a real-world example showing how to
    • embed intricate R code into Stata Ado programs
    • document Stata Ado programs with Markdown language using MarkDoc literate programming package
    • build Stata package using MarkDoc

Installation

The github package is the only recommended way for installing machinelearning. Once github is installed, you can install the development version of the package as follows. Currently, the package is work-in-progress and there is no stable release yet.

github install haghish/machinelearning

Programs

missforest: Missing data imputation with Random Forest

missforst embeds the missForest R package in Stata. This is a very simple - yet very powerful - missing data imputation that can provide unbised Out Of Bag (OOB) error estimation for each variable. Load your dataset in Stata and call the command! The imputed data will be loaded automatically in Stata once the imputaion is done. continue reading on GitHub Wiki ...

kNN: Missing data imputation with kNN algorithm

Missing values are inseperable parts of large datasets. However, the larger the data, daily imputation methods such as multiple imputation and even faster variations, such as Random Forest imputation is less feasible or infeasible. This is when faster algorithms shine, especially kNN. This package includes a Stata program that implements the kNN algorithm using using rcall. continue reading on GitHub Wiki ...

About

a Stata module for machine learning (ML) algorithms, implemented within R using rcall package

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published