-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation. #10234
Conversation
Early comment: Maybe we should reinstate all existing .md files but replace the contents with redirection links to the relevant new parts of the guide. Websites might have links which will be broken by these changes. I guess that should include files removed in [https://github.com//pull/10207] too. |
Test build #47461 has finished for PR 10234 at commit
|
Related to my earlier comment, it might be good to rename the ml-intro file to ml-guide; there isn't a need to rename it. |
regression and linear least squares with $L_1$ or $L_2$ regularization. | ||
Refer to [the linear methods in mllib](mllib-linear-methods.html) for | ||
details. In `spark.ml`, we also include Pipelines API for [Elastic | ||
details about implementation and tuning. We also include a Dataframe API for [Elastic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"DataFrame" (F capitalized) (elsewhere too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected everywhere in the documentation
Done with review. |
@jkbradley I agree with you. Note that when you use the doc, the current section appears in bold in the side menu (and the submenu gets expandend). This is why I did not include it again, but I do not have a strong opinion on this point: |
OK then I'll still vote for including it : ) |
@@ -27,10 +27,10 @@ displayTitle: Classification and regression in spark.ml | |||
* This will become a table of contents (this text will be scraped). | |||
{:toc} | |||
|
|||
In MLlib, we implement popular linear methods such as logistic | |||
In `spark.ml`, we implement popular linear methods such as logistic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.mllib right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not this file; the "ml-" prefix ones are for spark.ml. (It's true the functionality is almost the same currently, but it's a bit different and will diverge more.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the purpose now. It was the old MLlib text, but a lot of it still applies. The distinction is removed.
@jkbradley done with changes, let me know what you think. |
Test build #47530 has finished for PR 10234 at commit
|
In addition to moving ml-intro back to ml-guide, it'd be nice if the sidebar had links back to the main spark.ml and spark.mllib pages. That could be done in a separate JIRA/PR, if you prefer. |
@jkbradley done: |
Test build #47535 has finished for PR 10234 at commit
|
Test build #47537 has finished for PR 10234 at commit
|
title: Survival Regression - ML | ||
displayTitle: <a href="ml-guide.html">ML</a> - Survival Regression | ||
title: Survival Regression - spark.ml | ||
displayTitle: Survival Regression - spark.ml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc should now be a redirect to the ml-classification-regression.html#survival-regression section.
Also, it looks like some of the math renders incorrectly, but let's fix that in a follow-up.
That's the only remaining issue I found. I checked against the Spark 1.5 doc links as well. |
LGTM pending tests. Thanks! (not sure what the deal is with survival regression math; I probably am missing some library on my computer.) |
Test build #47540 has finished for PR 10234 at commit
|
Merging with master and branch-1.6 |
…rk.mllib and mllib in the documentation. Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark). It also removes some files that I forgot to delete with #10207 Author: Timothy Hunter <[email protected]> Closes #10234 from thunterdb/12212. (cherry picked from commit 2ecbe02) Signed-off-by: Joseph K. Bradley <[email protected]>
Replaces a number of occurences of
MLlib
in the documentation that were meant to refer to thespark.mllib
package instead. It should clarify for new users the difference betweenspark.mllib
(the package) and MLlib (the umbrella project for ML in spark).It also removes some files that I forgot to delete with #10207