Skip to content

A package full of linear algebra operators for Apache Spark MLlib's linalg package

License

Notifications You must be signed in to change notification settings

brkyvz/lazy-linalg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lazy-linalg

Linear algebra operators for Apache Spark MLlib's linalg package. Works best with Spark 1.5.

Goal

Most of the code in this repository was written as a part of SPARK-6442. The aim was to support the most common local linear algebra operations on top of Spark without having to depend on an external library.

It is somewhat cumbersome to write code where you have to convert the MLlib representation of a vector or matrix to Breeze perform the simplest arithmetic operations like addition, subtraction, etc. This package aims to lift that burden, and provide efficient implementations for some of these methods.

By keeping operations lazy, this package provides some of the optimizations that you would see in C++ libraries like Armadillo, Eigen, etc.

Installation

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages brkyvz:lazy-linalg:0.1.0

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "brkyvz/lazy-linalg:0.1.0"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "brkyvz" % "lazy-linalg" % "0.1.0"

Maven

In your pom.xml, add:

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>brkyvz</groupId>
    <artifactId>lazy-linalg</artifactId>
    <version>0.1.0</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Examples

Import com.brkyvz.spark.linalg._ and all the implicits will kick in for Scala users.

scala> import com.brkyvz.spark.linalg._
scala> import org.apache.spark.mllib.linalg._

scala> val rnd = new java.util.Random

scala> val A = DenseMatrix.eye(3)
A: org.apache.spark.mllib.linalg.DenseMatrix =
1.0  0.0  0.0
0.0  1.0  0.0
0.0  0.0  1.0

scala> val B = DenseMatrix.rand(3, 3, rnd)
B: org.apache.spark.mllib.linalg.DenseMatrix =
0.6133402813080373   0.7162729054788076   0.15011768207263143
0.3078993912354502   0.23923486751376188  0.05973497171994935
0.49892408305838276  0.9534484503645188   0.48047741591983717

scala> val C = DenseMatrix.zeros(3, 3)
C: org.apache.spark.mllib.linalg.DenseMatrix =
0.0  0.0  0.0
0.0  0.0  0.0
0.0  0.0  0.0

scala> C := A + B + A * B
res0: com.brkyvz.spark.linalg.MatrixLike =
2.2266805626160746  1.4325458109576152  0.30023536414526286
0.6157987824709004  1.4784697350275238  0.1194699434398987
0.9978481661167655  1.9068969007290375  1.9609548318396746

scala> C += -1
res1: com.brkyvz.spark.linalg.DenseMatrixWrapper =
1.2266805626160746      0.43254581095761524  -0.6997646358547371
-0.38420121752909964    0.47846973502752377  -0.8805300565601013
-0.0021518338832344774  0.9068969007290375   0.9609548318396746

scala> val D = A * 2 - 1
scala> D.compute()
res2: com.brkyvz.spark.linalg.MatrixLike =
1.0   -1.0  -1.0
-1.0  1.0   -1.0
-1.0  -1.0  1.0

Matrix multiplication vs. element-wise multiplication:

scala> C := A * B
res4: com.brkyvz.spark.linalg.MatrixLike =
0.6133402813080373   0.7162729054788076   0.15011768207263143
0.3078993912354502   0.23923486751376188  0.05973497171994935
0.49892408305838276  0.9534484503645188   0.48047741591983717

scala> C := A :* B
res5: com.brkyvz.spark.linalg.MatrixLike =
0.6133402813080373  0.0                  0.0
0.0                 0.23923486751376188  0.0
0.0                 0.0                  0.48047741591983717

Support for element-wise basic math functions:

import com.brkyvz.spark.linalg.funcs._

scala> C := pow(B, A)
res7: com.brkyvz.spark.linalg.MatrixLike =
0.6133402813080373  1.0                  1.0
1.0                 0.23923486751376188  1.0
1.0                 1.0                  0.48047741591983717

scala> C := exp(A)
res8: com.brkyvz.spark.linalg.MatrixLike =
2.718281828459045  1.0                1.0
1.0                2.718281828459045  1.0
1.0                1.0                2.718281828459045

scala> C := asin(A) - math.Pi / 2
res12: com.brkyvz.spark.linalg.MatrixLike =
0.0                  -1.5707963267948966  -1.5707963267948966
-1.5707963267948966  0.0                  -1.5707963267948966
-1.5707963267948966  -1.5707963267948966  0.0

All operations work similarly for vectors.

scala> import com.brkyvz.spark.linalg._
scala> import org.apache.spark.mllib.linalg._

scala> val x = Vectors.dense(1, 0, 2, 1)
x: org.apache.spark.mllib.linalg.Vector = [1.0,0.0,2.0,1.0]

scala> val y = Vectors.dense(0, 0, 0, 0)
y: org.apache.spark.mllib.linalg.Vector = [0.0,0.0,0.0,0.0]

scala> y := x / 2 + x * 0.5
res15: com.brkyvz.spark.linalg.DenseVectorWrapper = [1.0,0.0,2.0,1.0]

scala> val z = Vectors.dense(1, -2, 3)
z: org.apache.spark.mllib.linalg.Vector = [1.0,-2.0,3.0]

scala> val m = A * z + 1
scala> m.compute()
res16: com.brkyvz.spark.linalg.VectorLike = [2.0,-1.0,4.0]

Caveats

1- Implicits may not work perfectly for vectors and matrices generated through Matrices. and Vectors. consutructors.

2- Scalars need to be after the matrix or the vector during operations, e.g. use x * 2 instead of 2 * x.

What's to Come

1- Support for more linear algebra operations: determinant, matrix inverse, trace, slicing, reshaping...

2- Supporting such methods with BlockMatrix.

3- Better, smarter lazy evaluation with codegen. This should bring performance closer to C++ libraries.

Contributing

If you run across any bugs, please file issues, and please feel free to submit pull requests!

About

A package full of linear algebra operators for Apache Spark MLlib's linalg package

Resources

License

Stars

Watchers

Forks

Packages

No packages published