Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation. #10234

Closed
wants to merge 9 commits into from

Conversation

thunterdb
Copy link
Contributor

Replaces a number of occurences of MLlib in the documentation that were meant to refer to the spark.mllib package instead. It should clarify for new users the difference between spark.mllib (the package) and MLlib (the umbrella project for ML in spark).

It also removes some files that I forgot to delete with #10207

@jkbradley
Copy link
Member

Early comment: Maybe we should reinstate all existing .md files but replace the contents with redirection links to the relevant new parts of the guide. Websites might have links which will be broken by these changes. I guess that should include files removed in [https://github.com//pull/10207] too.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47461 has finished for PR 10234 at commit 42bf337.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

@jkbradley
Copy link
Member

Related to my earlier comment, it might be good to rename the ml-intro file to ml-guide; there isn't a need to rename it.

regression and linear least squares with $L_1$ or $L_2$ regularization.
Refer to [the linear methods in mllib](mllib-linear-methods.html) for
details. In `spark.ml`, we also include Pipelines API for [Elastic
details about implementation and tuning. We also include a Dataframe API for [Elastic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"DataFrame" (F capitalized) (elsewhere too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected everywhere in the documentation

@jkbradley
Copy link
Member

Done with review.
One final comment: I'd prefer adding spark.ml or spark.mllib to the end of both the title and displayTitle. I like having it in displayTitle since that provides the clearest signal about which page the user is viewing.

@thunterdb
Copy link
Contributor Author

@jkbradley I agree with you. Note that when you use the doc, the current section appears in bold in the side menu (and the submenu gets expandend). This is why I did not include it again, but I do not have a strong opinion on this point:

screen shot 2015-12-09 at 5 14 18 pm

@jkbradley
Copy link
Member

OK then I'll still vote for including it : )

@@ -27,10 +27,10 @@ displayTitle: Classification and regression in spark.ml
* This will become a table of contents (this text will be scraped).
{:toc}

In MLlib, we implement popular linear methods such as logistic
In `spark.ml`, we implement popular linear methods such as logistic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.mllib right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this file; the "ml-" prefix ones are for spark.ml. (It's true the functionality is almost the same currently, but it's a bit different and will diverge more.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the purpose now. It was the old MLlib text, but a lot of it still applies. The distinction is removed.

@thunterdb
Copy link
Contributor Author

@jkbradley done with changes, let me know what you think.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47530 has finished for PR 10234 at commit 6d66645.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

@jkbradley
Copy link
Member

In addition to moving ml-intro back to ml-guide, it'd be nice if the sidebar had links back to the main spark.ml and spark.mllib pages. That could be done in a separate JIRA/PR, if you prefer.

@thunterdb
Copy link
Contributor Author

@jkbradley done:
screen shot 2015-12-10 at 11 19 30 am

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47535 has finished for PR 10234 at commit c75a5ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47537 has finished for PR 10234 at commit 8432ac9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

title: Survival Regression - ML
displayTitle: <a href="ml-guide.html">ML</a> - Survival Regression
title: Survival Regression - spark.ml
displayTitle: Survival Regression - spark.ml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc should now be a redirect to the ml-classification-regression.html#survival-regression section.

Also, it looks like some of the math renders incorrectly, but let's fix that in a follow-up.

@jkbradley
Copy link
Member

That's the only remaining issue I found. I checked against the Spark 1.5 doc links as well.

@jkbradley
Copy link
Member

LGTM pending tests. Thanks! (not sure what the deal is with survival regression math; I probably am missing some library on my computer.)

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47540 has finished for PR 10234 at commit 68f7e53.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

@jkbradley
Copy link
Member

Merging with master and branch-1.6
Thank you!

asfgit pushed a commit that referenced this pull request Dec 10, 2015
…rk.mllib and mllib in the documentation.

Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).

It also removes some files that I forgot to delete with #10207

Author: Timothy Hunter <[email protected]>

Closes #10234 from thunterdb/12212.

(cherry picked from commit 2ecbe02)
Signed-off-by: Joseph K. Bradley <[email protected]>
@asfgit asfgit closed this in 2ecbe02 Dec 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants