Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods for ML #15913

Closed
wants to merge 12 commits into from

Conversation

yanboliang
Copy link
Contributor

What changes were proposed in this pull request?

Remove deprecated methods for ML.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Nov 17, 2016

Test build #68743 has finished for PR 15913 at commit ecddf15.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2016

Test build #68757 has started for PR 15913 at commit 9a8c926.

@SparkQA
Copy link

SparkQA commented Nov 17, 2016

Test build #68780 has finished for PR 15913 at commit 9a8c926.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 18, 2016

Test build #68829 has finished for PR 15913 at commit 9a8c926.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 18, 2016

Test build #68849 has finished for PR 15913 at commit 13988b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2016

Test build #68892 has finished for PR 15913 at commit 36c2e24.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2016

Test build #68893 has finished for PR 15913 at commit 8d3c47a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Nov 20, 2016

Test build #68900 has finished for PR 15913 at commit 8d3c47a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

cc @jkbradley @sethah

/**
* Number of trees in ensemble
*/
val getNumTrees: Int = trees.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this was moved? It seems unrelated to the cleanup described in the title.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a headache. Before, getNumTrees on gbts/rfs would dispatch here. Now, we removed numTrees from RF (it was deprecated) and so now we can have rf inherit from RandomForestParams which (now) has a param numTrees and its associated getter. So if we hadn't removed this, rf would have two getNumTrees methods and one would take precedence. We removed it from here to avoid that, then we had to add them back in the individual GBT subclasses. I think I've got that right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethah You explained this problem very clearly and totally right. Thanks.

def setStepSize(value: Double): this.type = set(stepSize, value)

override def validateParams(): Unit = {
def setStepSize(value: Double): this.type = {
require(ParamValidators.inRange(0, 1, lowerInclusive = false, upperInclusive = true)(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is where we want to do this? The deprecation warning said to move checks from here into transformSchema - but its possible the deprecation warning was incomplete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to fail at set time rather than when fit is called. This was also not used before, seems we missed it when we deprecated validateParams() ?

Copy link
Contributor Author

@yanboliang yanboliang Nov 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original validateParams() was only used to check for interactions between parameters. Parameter value checks which do not depend on other parameters are handled by ````Param.validate()``` at the definition, i.e.

final val k = new IntParam(this, "k", "The number of clusters to create. " +
    "Must be > 1.", ParamValidators.gt(1))

However, stepSize was a trait and inherited by lots of sub-classes which have different constraints for this parameter, so I add the check at setter.
Yeah, I think the original place of this check(putting in validateParams) was a bug, since it does not involves interactions between parameters. We missed it when we deprecated validateParams(), since there is no test case to validate this param before. I added corresponding test in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes more sense :) +1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation should really be in the Param itself. It's not since it's a sharedParam, but perhaps we should just copy the shared Param code here. The problem with putting validation in the setter is that Params can get set in other ways too (such as gbt.set(stepSize, value)

Copy link
Contributor

@sethah sethah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good overall, a few small comments.

@@ -176,8 +176,12 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas
}
}

override def validateParams(): Unit = {
override protected def validateAndTransformSchema(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually necessary? Before, validateParams() was never used. Seems like we could just remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bug that validateParams was never used. It should validate params interaction before fitting(if necessary), this is why we deprecate validateParams and move what it does to transformSchema. We do not have corresponding test cases before, so no test was broken when we deprecated validateParams. I added test cases in this PR.

def setStepSize(value: Double): this.type = set(stepSize, value)

override def validateParams(): Unit = {
def setStepSize(value: Double): this.type = {
require(ParamValidators.inRange(0, 1, lowerInclusive = false, upperInclusive = true)(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to fail at set time rather than when fit is called. This was also not used before, seems we missed it when we deprecated validateParams() ?

/**
* Number of trees in ensemble
*/
val getNumTrees: Int = trees.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a headache. Before, getNumTrees on gbts/rfs would dispatch here. Now, we removed numTrees from RF (it was deprecated) and so now we can have rf inherit from RandomForestParams which (now) has a param numTrees and its associated getter. So if we hadn't removed this, rf would have two getNumTrees methods and one would take precedence. We removed it from here to avoid that, then we had to add them back in the individual GBT subclasses. I think I've got that right.

/**
* Number of trees in ensemble
*/
val getNumTrees: Int = trees.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have an @Since tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

* (default = 20)
* @group param
*/
final val numTrees: IntParam = new IntParam(this, "numTrees", "Number of trees to train (>= 1)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the reason that we cannot add this to both GBT and RF (i.e. in TreeEnsembleParams) is because the param maxIter controls how many trees a GBT has. The semantics in the algos are a bit different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, added.

"""Sets the SQL context to use for saving."""
"""
Sets the SQL context to use for saving.
.. note:: Deprecated in 2.1, use session instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove it from Scala in 2.2, then we have to remove it here in 2.2 right? I guess we should note it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

*/
@Since("1.6.0")
@deprecated("labelCol is not used by ChiSqSelectorModel.", "2.0.0")
def setLabelCol(value: String): this.type = set(labelCol, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remove it from Python as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All python ML models don't support setter/getter params currently, so no need to remove it.

@SparkQA
Copy link

SparkQA commented Nov 24, 2016

Test build #69111 has started for PR 15913 at commit 1d8cef5.

@SparkQA
Copy link

SparkQA commented Nov 24, 2016

Test build #69108 has finished for PR 15913 at commit 0c85fd4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Nov 24, 2016

Test build #69130 has finished for PR 15913 at commit 1d8cef5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@jkbradley jkbradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Adding a couple of comments

* Validates parameter values stored internally.
* Raise an exception if any parameter value is invalid.
*
* This only needs to check for interactions between parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the relevant text from here to transformSchema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def setStepSize(value: Double): this.type = set(stepSize, value)

override def validateParams(): Unit = {
def setStepSize(value: Double): this.type = {
require(ParamValidators.inRange(0, 1, lowerInclusive = false, upperInclusive = true)(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation should really be in the Param itself. It's not since it's a sharedParam, but perhaps we should just copy the shared Param code here. The problem with putting validation in the setter is that Params can get set in other ways too (such as gbt.set(stepSize, value)

getStepSize), "GBT parameter stepSize should be in interval (0, 1], " +
s"but it given invalid value $getStepSize.")
value), "GBT parameter stepSize should be in interval (0, 1], " +
s"but it given invalid value $value.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it given invalid value" -> "it was given an invalid value"

setDefault(numTrees -> 20)

/** @group setParam */
def setNumTrees(value: Int): this.type = set(numTrees, value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these setter methods in traits Java-compatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we already have setNumTrees which calls super.setNumTrees in RandomForestClassifier and RandomForestRegressor.

@jkbradley
Copy link
Member

One more thing: This breaks several public APIs, although in mostly minor ways and with good reasons. Could you please update the JIRA to list the public API changes?

@SparkQA
Copy link

SparkQA commented Nov 25, 2016

Test build #69158 has finished for PR 15913 at commit 3c8ee2d.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2016

Test build #69159 has finished for PR 15913 at commit c874d0e.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2016

Test build #69160 has finished for PR 15913 at commit 157acf9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2016

Test build #69167 has finished for PR 15913 at commit 864be6e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/** @group getParam */
final def getStepSize: Double = $(stepSize)

/** @group setParam */
def setStepSize(value: Double): this.type = set(stepSize, value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked about this being Java-friendly because Param setter methods in traits used to have the wrong type in Java. I wonder if this is no longer true. Still, it might make sense to remove the setter method from the trait since it does not make sense to have it in the Model classes. We could put the setter method in each subclass and then deprecate the method in the Model classes.

This issue also shows up in MimaExcludes. If setFeatureSubsetStrategy were put in the concrete classes from the outset, then you would not need to include it in MimaExcludes now.

I'm Ok with doing the change now or in a follow-up PR.

I believe I was the one who incorrectly put the setters in the traits...

Copy link
Contributor Author

@yanboliang yanboliang Nov 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand what you mean. If we would like to correct the setter methods in traits, we involve changes to lots of traits which include DecisionTreeParams, TreeClassifierParams, TreeRegressorParams, RandomForestParams, GBTParams, etc. So i will merge this firstly after it passed Jenkins and address this issue in a separate follow-up PR. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley I have sent #16017 to fix this issue, please feel free to comment that. Thanks.

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Nov 26, 2016

Test build #69183 has finished for PR 15913 at commit 864be6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

Merged into master and branch-2.1. Thanks for all reviewing.

asfgit pushed a commit that referenced this pull request Nov 26, 2016
## What changes were proposed in this pull request?
Remove deprecated methods for ML.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes #15913 from yanboliang/spark-18481.

(cherry picked from commit c4a7eef)
Signed-off-by: Yanbo Liang <[email protected]>
@asfgit asfgit closed this in c4a7eef Nov 26, 2016
@yanboliang yanboliang deleted the spark-18481 branch November 26, 2016 13:33
asfgit pushed a commit that referenced this pull request Nov 29, 2016
## What changes were proposed in this pull request?
Mainly two changes:
* Move DT/RF/GBT Param setter methods to subclasses.
* Deprecate corresponding setter methods in the model classes.

See discussion here #15913 (comment).

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes #16017 from yanboliang/spark-18592.

(cherry picked from commit 95f7985)
Signed-off-by: Joseph K. Bradley <[email protected]>
asfgit pushed a commit that referenced this pull request Nov 29, 2016
## What changes were proposed in this pull request?
Mainly two changes:
* Move DT/RF/GBT Param setter methods to subclasses.
* Deprecate corresponding setter methods in the model classes.

See discussion here #15913 (comment).

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes #16017 from yanboliang/spark-18592.
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
## What changes were proposed in this pull request?
Remove deprecated methods for ML.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes apache#15913 from yanboliang/spark-18481.
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
## What changes were proposed in this pull request?
Mainly two changes:
* Move DT/RF/GBT Param setter methods to subclasses.
* Deprecate corresponding setter methods in the model classes.

See discussion here apache#15913 (comment).

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes apache#16017 from yanboliang/spark-18592.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
Remove deprecated methods for ML.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes apache#15913 from yanboliang/spark-18481.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
Mainly two changes:
* Move DT/RF/GBT Param setter methods to subclasses.
* Deprecate corresponding setter methods in the model classes.

See discussion here apache#15913 (comment).

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes apache#16017 from yanboliang/spark-18592.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants