Skip to content

Latest commit

 

History

History
80 lines (58 loc) · 3.25 KB

sparklyr.adoc

File metadata and controls

80 lines (58 loc) · 3.25 KB

sparklyr Project Proposal

Name of the project: sparklyr

Requested maturity level: Incubation

Description

The sparklyr R package provides a modern interface to Apache Spark, a fast and general engine for big data processing. This package supports connecting to local and remote Apache Spark clusters, provides a dplyr compatible back-end, an interface to Spark’s built-in machine learning algorithms, support for Spark structured streams, Spark pipelines, support to execute custom R code across Spark clusters, and enables multiple extensions to use H2O, XGBoost, GraphFrames, MLeap and many others in Spark from R.

Possible integrations with existing LF AI projects:

https://github.com/uber/horovod - Enable support for Horovod in R with sparklyr. https://github.com/onnx/onnx - Enable support to export models and pipelines.

License: Apache License 2.0

External dependencies:

Depends on:

Initial committers:

Infrastructure requests: None

Current mailing lists: None, we will request LF to create lists for users, developers, and TSC.

Resources:

Website: Currently https://spark.rstudio.com, need to transfer site to new domain.

Release methodology & mechanics: The release is performed when committers pass the vote to do so. The release is performed by the Release Manager (Javier Luraschi).

The release procedure is describede in https://github.com/sparklyr/sparklyr/wiki/Releases

Social media accounts: None

Existing sponsorship: RStudio, Databricks, and Qubole have provided developer resources to improve sparklyr.