Skip to content

Commit

Permalink
[SPARK-7383] [ML] Feature Parity in PySpark for ml.features
Browse files Browse the repository at this point in the history
Implemented python wrappers for Scala functions that don't exist in `ml.features`

Author: Burak Yavuz <[email protected]>

Closes #5991 from brkyvz/ml-feat-PR and squashes the following commits:

adcca55 [Burak Yavuz] add regex tokenizer to __all__
b91cb44 [Burak Yavuz] addressed comments
bd39fd2 [Burak Yavuz] remove addition
b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
  • Loading branch information
brkyvz authored and mengxr committed May 8, 2015
1 parent c796be7 commit f5ff4a8
Show file tree
Hide file tree
Showing 5 changed files with 851 additions and 43 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ import org.apache.spark.sql.types.DataType
* which is available at [[http://en.wikipedia.org/wiki/Polynomial_expansion]], "In mathematics, an
* expansion of a product of sums expresses it as a sum of products by using the fact that
* multiplication distributes over addition". Take a 2-variable feature vector as an example:
* `(x, y)`, if we want to expand it with degree 2, then we get `(x, y, x * x, x * y, y * y)`.
* `(x, y)`, if we want to expand it with degree 2, then we get `(x, x * x, y, x * y, y * y)`.
*/
@AlphaComponent
class PolynomialExpansion extends UnaryTransformer[Vector, Vector, PolynomialExpansion] {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ class Tokenizer extends UnaryTransformer[String, Seq[String], Tokenizer] {

/**
* :: AlphaComponent ::
* A regex based tokenizer that extracts tokens either by repeatedly matching the regex(default)
* A regex based tokenizer that extracts tokens either by repeatedly matching the regex(default)
* or using it to split the text (set matching to false). Optional parameters also allow filtering
* tokens using a minimal length.
* It returns an array of strings that can be empty.
Expand Down
Loading

0 comments on commit f5ff4a8

Please sign in to comment.