forked from alteryx/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-7555] [DOCS] Add doc for elastic net in ml-guide and mllib-guide
jkbradley I put the elastic net under the **Algorithm guide** section. Also add the formula of elastic net in mllib-linear `mllib-linear-methods#regularizers`. dbtsai I left the code tab for you to add example code. Do you think it is the right place? Author: Shuo Xiang <[email protected]> Closes apache#6504 from coderxiang/elasticnet and squashes the following commits: f6061ee [Shuo Xiang] typo 90a7c88 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet 0610a36 [Shuo Xiang] move out the elastic net to ml-linear-methods 8747190 [Shuo Xiang] merge master 706d3f7 [Shuo Xiang] add python code 9bc2b4c [Shuo Xiang] typo db32a60 [Shuo Xiang] java code sample aab3b3a [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet a0dae07 [Shuo Xiang] simplify code d8616fd [Shuo Xiang] Update the definition of elastic net. Add scala code; Mention Lasso and Ridge df5bd14 [Shuo Xiang] use wikipeida page in ml-linear-methods.md 78d9366 [Shuo Xiang] address comments 8ce37c2 [Shuo Xiang] Merge branch 'elasticnet' of github.com:coderxiang/spark into elasticnet 8f24848 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc 998d766 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc 89f10e4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc 9262a72 [Shuo Xiang] update 7e07d12 [Shuo Xiang] update b32f21a [Shuo Xiang] add doc for elastic net in sparkml 937eef1 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc 180b496 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' aa0717d [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 98804c9 [Shuo Xiang] fix bug in topBykey and update test
- Loading branch information
1 parent
9716a72
commit 303c120
Showing
3 changed files
with
188 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
--- | ||
layout: global | ||
title: Linear Methods - ML | ||
displayTitle: <a href="ml-guide.html">ML</a> - Linear Methods | ||
--- | ||
|
||
|
||
`\[ | ||
\newcommand{\R}{\mathbb{R}} | ||
\newcommand{\E}{\mathbb{E}} | ||
\newcommand{\x}{\mathbf{x}} | ||
\newcommand{\y}{\mathbf{y}} | ||
\newcommand{\wv}{\mathbf{w}} | ||
\newcommand{\av}{\mathbf{\alpha}} | ||
\newcommand{\bv}{\mathbf{b}} | ||
\newcommand{\N}{\mathbb{N}} | ||
\newcommand{\id}{\mathbf{I}} | ||
\newcommand{\ind}{\mathbf{1}} | ||
\newcommand{\0}{\mathbf{0}} | ||
\newcommand{\unit}{\mathbf{e}} | ||
\newcommand{\one}{\mathbf{1}} | ||
\newcommand{\zero}{\mathbf{0}} | ||
\]` | ||
|
||
|
||
In MLlib, we implement popular linear methods such as logistic regression and linear least squares with L1 or L2 regularization. Refer to [the linear methods in mllib](mllib-linear-methods.html) for details. In `spark.ml`, we also include Pipelines API for [Elastic net](http://en.wikipedia.org/wiki/Elastic_net_regularization), a hybrid of L1 and L2 regularization proposed in [this paper](http://users.stat.umn.edu/~zouxx019/Papers/elasticnet.pdf). Mathematically it is defined as a linear combination of the L1-norm and the L2-norm: | ||
`\[ | ||
\alpha \|\wv\|_1 + (1-\alpha) \frac{1}{2}\|\wv\|_2^2, \alpha \in [0, 1]. | ||
\]` | ||
By setting $\alpha$ properly, it contains both L1 and L2 regularization as special cases. For example, if a [linear regression](https://en.wikipedia.org/wiki/Linear_regression) model is trained with the elastic net parameter $\alpha$ set to $1$, it is equivalent to a [Lasso](http://en.wikipedia.org/wiki/Least_squares#Lasso_method) model. On the other hand, if $\alpha$ is set to $0$, the trained model reduces to a [ridge regression](http://en.wikipedia.org/wiki/Tikhonov_regularization) model. We implement Pipelines API for both linear regression and logistic regression with elastic net regularization. | ||
|
||
**Examples** | ||
|
||
<div class="codetabs"> | ||
|
||
<div data-lang="scala" markdown="1"> | ||
|
||
{% highlight scala %} | ||
|
||
import org.apache.spark.ml.classification.LogisticRegression | ||
import org.apache.spark.mllib.util.MLUtils | ||
|
||
// Load training data | ||
val training = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt").toDF() | ||
|
||
val lr = new LogisticRegression() | ||
.setMaxIter(10) | ||
.setRegParam(0.3) | ||
.setElasticNetParam(0.8) | ||
|
||
// Fit the model | ||
val lrModel = lr.fit(training) | ||
|
||
// Print the weights and intercept for logistic regression | ||
println(s"Weights: ${lrModel.weights} Intercept: ${lrModel.intercept}") | ||
|
||
{% endhighlight %} | ||
|
||
</div> | ||
|
||
<div data-lang="java" markdown="1"> | ||
|
||
{% highlight java %} | ||
|
||
import org.apache.spark.ml.classification.LogisticRegression; | ||
import org.apache.spark.ml.classification.LogisticRegressionModel; | ||
import org.apache.spark.mllib.regression.LabeledPoint; | ||
import org.apache.spark.mllib.util.MLUtils; | ||
import org.apache.spark.SparkConf; | ||
import org.apache.spark.SparkContext; | ||
import org.apache.spark.sql.DataFrame; | ||
import org.apache.spark.sql.SQLContext; | ||
|
||
public class LogisticRegressionWithElasticNetExample { | ||
public static void main(String[] args) { | ||
SparkConf conf = new SparkConf() | ||
.setAppName("Logistic Regression with Elastic Net Example"); | ||
|
||
SparkContext sc = new SparkContext(conf); | ||
SQLContext sql = new SQLContext(sc); | ||
String path = "sample_libsvm_data.txt"; | ||
|
||
// Load training data | ||
DataFrame training = sql.createDataFrame(MLUtils.loadLibSVMFile(sc, path).toJavaRDD(), LabeledPoint.class); | ||
|
||
LogisticRegression lr = new LogisticRegression() | ||
.setMaxIter(10) | ||
.setRegParam(0.3) | ||
.setElasticNetParam(0.8) | ||
|
||
// Fit the model | ||
LogisticRegressionModel lrModel = lr.fit(training); | ||
|
||
// Print the weights and intercept for logistic regression | ||
System.out.println("Weights: " + lrModel.weights() + " Intercept: " + lrModel.intercept()); | ||
} | ||
} | ||
{% endhighlight %} | ||
</div> | ||
|
||
<div data-lang="python" markdown="1"> | ||
|
||
{% highlight python %} | ||
|
||
from pyspark.ml.classification import LogisticRegression | ||
from pyspark.mllib.regression import LabeledPoint | ||
from pyspark.mllib.util import MLUtils | ||
|
||
# Load training data | ||
training = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt").toDF() | ||
|
||
lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8) | ||
|
||
# Fit the model | ||
lrModel = lr.fit(training) | ||
|
||
# Print the weights and intercept for logistic regression | ||
print("Weights: " + str(lrModel.weights)) | ||
print("Intercept: " + str(lrModel.intercept)) | ||
{% endhighlight %} | ||
|
||
</div> | ||
|
||
</div> | ||
|
||
### Optimization | ||
|
||
The optimization algorithm underlies the implementation is called [Orthant-Wise Limited-memory QuasiNewton](http://research-srv.microsoft.com/en-us/um/people/jfgao/paper/icml07scalable.pdf) | ||
(OWL-QN). It is an extension of L-BFGS that can effectively handle L1 regularization and elastic net. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters