Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SparkR][Doc] fix typo in vignettes #17884

Closed
wants to merge 3 commits into from

Conversation

actuaryzhang
Copy link
Contributor

What changes were proposed in this pull request?

Fix typo in vignettes

@actuaryzhang
Copy link
Contributor Author

@felixcheung

@SparkQA
Copy link

SparkQA commented May 6, 2017

Test build #76527 has finished for PR 17884 at commit 8639025.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I know it is legitimate but It would be worth double checking other typos too. Usually, single typo PR is not encouraged up to my knowledge given reviwing, building and merging costs.

@actuaryzhang
Copy link
Contributor Author

@HyukjinKwon Thanks for pointing this out. I will keep this in mind next time.

@felixcheung
Copy link
Member

This test seems flaky on AppVeyor, not sure why

Failed -------------------------------------------------------------------------
1. Error: spark.glm and predict (@test_mllib_regression.R#57) ------------------
java.lang.IllegalStateException: SparkContext has been shutdown
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2015)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2044)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2063)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:333)
	at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2923)
	at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2237)
	at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2237)
	at org.apache.spark.sql.Dataset$$anonfun$57.apply(Dataset.scala:2907)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2906)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:2237)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:2244)
	at org.apache.spark.sql.Dataset.first(Dataset.scala:2251)

@felixcheung
Copy link
Member

@actuaryzhang thanks - would you have a chance to run a quick QA check on the rest of the vignettes, if you haven't already?

@actuaryzhang
Copy link
Contributor Author

@felixcheung I ran a quick QA on the vignettes and fixed some additional typos and styles.

@SparkQA
Copy link

SparkQA commented May 7, 2017

Test build #76534 has finished for PR 17884 at commit 796a8e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks! just one minor request

@@ -405,7 +405,7 @@ result <- gapply(
head(arrange(result, "max_mpg", decreasing = TRUE))
```

Like gapply, `gapplyCollect` applies a function to each partition of a `SparkDataFrame` and collect the result back to R `data.frame`. The output of the function should be a `data.frame` but no schema is required in this case. Note that `gapplyCollect` can fail if the output of UDF run on all the partition cannot be pulled to the driver and fit in driver memory.
Like gapply, `gapplyCollect` applies a function to each partition of a `SparkDataFrame` and collect the result back to R `data.frame`. The output of the function should be a `data.frame` but no schema is required in this case. Note that `gapplyCollect` can fail if the output of the UDF on all partitions cannot be pulled into the driver's memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add backtick to gapply at the beginning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1079,19 +1079,19 @@ There are three main object classes in SparkR you may be working with.
+ `sdf` stores a reference to the corresponding Spark Dataset in the Spark JVM backend.
+ `env` saves the meta-information of the object such as `isCached`.

It can be created by data import methods or by transforming an existing `SparkDataFrame`. We can manipulate `SparkDataFrame` by numerous data processing functions and feed that into machine learning algorithms.
It can be created by data import methods or by transforming an existing `SparkDataFrame`. We can manipulate `SparkDataFrame` by numerous data processing functions and feed that into machine learning algorithms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, does this whitespace in front of paragraph get handled properly?

Copy link
Contributor Author

@actuaryzhang actuaryzhang May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung Yes, the four spaces indicate that the text following should be aligned with the bullet point. Otherwise, it will start as a new paragraph and have the wrong indention.
You will see the difference after compiling the Rmarkdown file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! thanks

@SparkQA
Copy link

SparkQA commented May 7, 2017

Test build #76555 has finished for PR 17884 at commit b0407b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request May 8, 2017
## What changes were proposed in this pull request?
Fix typo in vignettes

Author: Wayne Zhang <[email protected]>

Closes #17884 from actuaryzhang/typo.

(cherry picked from commit 2fdaeb5)
Signed-off-by: Felix Cheung <[email protected]>
@asfgit asfgit closed this in 2fdaeb5 May 8, 2017
@felixcheung
Copy link
Member

merged to master/2.2
thanks!

@actuaryzhang actuaryzhang deleted the typo branch May 8, 2017 06:22
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?
Fix typo in vignettes

Author: Wayne Zhang <[email protected]>

Closes apache#17884 from actuaryzhang/typo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants