Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins #17477

Closed
wants to merge 3 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Mar 30, 2017

What changes were proposed in this pull request?

This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.

There are several problems with it:

  • It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (Elapsed :[94.8746569157]). How it was tested is described in "How was this patch tested?".

  • One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.

    (see @JoshRosen's comment)

To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.

There fixes are similar instances that previously fixed. Please refer #15999 and #16013

Note that this only fixes errors not warnings. Please see my observation #17389 (comment) for spurious errors by warnings.

How was this patch tested?

Manually via jekyll build for building tests. Also, tested via running ./dev/run-tests.

This was tested via manually adding time.time() as below:

     profiles_and_goals = build_profiles + sbt_goals

     print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
           " ".join(profiles_and_goals))

+    import time
+    st = time.time()
     exec_sbt(profiles_and_goals)
+    print("Elapsed :[%s]" % str(time.time() - st))

produces

...
========================================================================
Building Unidoc API Documentation
========================================================================
...
[info] Main Java API documentation successful.
...
Elapsed :[94.8746569157]
...

@@ -230,7 +230,9 @@ class PipelineSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul
}


/** Used to test [[Pipeline]] with [[MLWritable]] stages */
/**
* Used to test [[Pipeline]] with `MLWritable` stages
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid inlined comment when there are code blocks (` ... `). See #16050

@@ -74,7 +74,7 @@ abstract class Classifier[
* and features (`Vector`).
* @param numClasses Number of classes label can take. Labels must be integers in the range
* [0, numClasses).
* @throws SparkException if any label is not an integer >= 0
* @note Throws `SparkException` if any label is not an integer is greater than or equal to 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case throws an error as below:

[error] .../spark/mllib/target/java/org/apache/spark/ml/classification/Classifier.java:28: error: reference not found
[error]    * @throws SparkException  if any label is not an integer >= 0
[error]      ^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is a non-integer or is negative?

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Mar 30, 2017

FYI, If I haven't missed something, all the cases are the instances same with the ones previously fixed. cc @JoshRosen, @srowen and @jkbradley. Could you take a look and see if it makes sense?

* For example, given <h1, [o1, o2, o3]>, <h2, [o4]>, <h1, [o5, o6]>, returns
* [o1, o5, o4, 02, o6, o3]
* For example, given a map consisting of h1 to [o1, o2, o3], h2 to [o4] and h3 to [o5, o6],
* returns a list, [o1, o5, o4, o2, o6, o3].
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There look few typos here. 02 -> o2 and h1 -> h3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also wrap this in code or otherwise escape it or use a different symbol?

{h1: [o1, o2, o3], h2: [o4], ...}

is clearer.

@SparkQA
Copy link

SparkQA commented Mar 30, 2017

Test build #75381 has finished for PR 17477 at commit 7ddb6eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 30, 2017

Test build #75385 has finished for PR 17477 at commit 7a7cf04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 31, 2017

Test build #75416 has finished for PR 17477 at commit 4d39544.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

(gentle ping @JoshRosen).

@HyukjinKwon
Copy link
Member Author

@JoshRosen, I am fine with closing it for now if you are currently not sure of it.

@JoshRosen
Copy link
Contributor

jenkins retest this please

@@ -33,9 +33,9 @@ private[spark] trait RpcEnvFactory {
*
* It is guaranteed that `onStart`, `receive` and `onStop` will be called in sequence.
*
* The life-cycle of an endpoint is:
* The life-cycle of an endpoint is as below in an order:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just wrap this block as code? The rewording is confusing and doesn't read as clearly to me.

@JoshRosen
Copy link
Contributor

JoshRosen commented Apr 12, 2017

Left a few comments.

@srowen @jkbradley, could you take a look and merge it after changes if it looks okay to you? Overall build change structure looks okay to me if we're fine with failing PR build on doc build breaks. I did a somewhat cursory examination of the actual doc changes, so additional review there is welcome if you have time.

@SparkQA
Copy link

SparkQA commented Apr 12, 2017

Test build #75722 has finished for PR 17477 at commit 4d39544.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Looks there is another break.

[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/target/java/org/apache/spark/sql/catalog/Catalog.java:453: error: reference not found
[error]    * Invalidates and refreshes all the cached data (and the associated metadata) for any {@link Dataset}
[error]

Let me clean up this and address comments. Thank you @JoshRosen.

* For example, given <h1, [o1, o2, o3]>, <h2, [o4]>, <h1, [o5, o6]>, returns
* [o1, o5, o4, 02, o6, o3]
* For example, given {@literal <h1, [o1, o2, o3]>}, {@literal <h2, [o4]>} and
* {@literal <h3, [o5, o6]>}, returns {@literal [o1, o5, o4, o2, o6, o3]}.
Copy link
Member Author

@HyukjinKwon HyukjinKwon Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can't use @code here if there are codes such as <A ...> (it seems < A...> case looks fine). I ran some tests with the comments below:

 * For example, given {@code < h1, [o1, o2, o3] >}, {@code < h2, [o4]>} and {@code <h3, [o5, o6]>},
 * returns {@code [o1, o5, o4, o2, o6, o3]}.
 *
 * For example, given
 *
 * {@code <h1, [o1, o2, o3]>},
 *
 * {@code <h2, [o4]} and
 *
 * {@code h3, [o5, o6]>},
 *
 * returns {@code [o1, o5, o4, o2, o6, o3]}.

Scaladoc

2017-04-12 4 34 04

Javadoc

2017-04-12 4 34 38

If we use @literal, it seems fine.

Scaladoc

2017-04-12 4 46 54

Javadoc

2017-04-12 4 46 43

This seems not exposed in the API documentation anyway.

@@ -296,7 +296,7 @@ trait MesosSchedulerUtils extends Logging {

/**
* Parses the attributes constraints provided to spark and build a matching data struct:
* Map[<attribute-name>, Set[values-to-match]]
* {@literal Map[<attribute-name>, Set[values-to-match]}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -89,7 +89,7 @@ public static String getKerberosServiceTicket(String principal, String host,
* @param clientUserName Client User name.
* @return An unsigned cookie token generated from input parameters.
* The final cookie generated is of the following format :
* cu=<username>&rn=<randomNumber>&s=<cookieSignature>
* {@code cu=<username>&rn=<randomNumber>&s=<cookieSignature>}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is java code. So, @code should be fine. This also seems not exposed to the documentation anyway.

@@ -35,7 +35,7 @@ private[spark] trait RpcEnvFactory {
*
* The life-cycle of an endpoint is:
*
* constructor -> onStart -> receive* -> onStop
* {@code constructor -> onStart -> receive* -> onStop}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this, it produces the documentation as below (manually tested)

Scaladoc

2017-04-12 5 08 09

Javadoc

2017-04-12 5 07 58

This also seems not exposed to API documentation anyway.

ghost pushed a commit to dbtsai/spark that referenced this pull request Apr 12, 2017
## What changes were proposed in this pull request?

This PR proposes corrections related to JSON APIs as below:

- Rendering links in Python documentation
- Replacing `RDD` to `Dataset` in programing guide
- Adding missing description about JSON Lines consistently in `DataFrameReader.json` in Python API
- De-duplicating little bit of `DataFrameReader.json` in Scala/Java API

## How was this patch tested?

Manually build the documentation via `jekyll build`. Corresponding snapstops will be left on the codes.

Note that currently there are Javadoc8 breaks in several places. These are proposed to be handled in apache#17477. So, this PR does not fix those.

Author: hyukjinkwon <[email protected]>

Closes apache#17602 from HyukjinKwon/minor-json-documentation.
@SparkQA
Copy link

SparkQA commented Apr 12, 2017

Test build #75739 has finished for PR 17477 at commit aefae0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 12, 2017

Merged to master

@asfgit asfgit closed this in ceaf77a Apr 12, 2017
@HyukjinKwon
Copy link
Member Author

Thank you!

@srowen
Copy link
Member

srowen commented Apr 13, 2017

Hm, really weird @HyukjinKwon but this fails on the SBT master build but only for Hadoop 2.6. Maven and 2.7 are fine.

[error] /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.6/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala:123: value createDatumWriter is not a member of org.apache.avro.generic.GenericData
[error]     writerCache.getOrElseUpdate(schema, GenericData.get.createDatumWriter(schema))
[error]     

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/2770/consoleFull

I can only assume it's some classpath-related problem, such that the classpath resolution for Hadoop 2.6 libs and for SBT, and only for the scaladoc plugin, doesn't see the right version of Avro.

This might be tricky to resolve. Have you seen anything like this before? I wonder if there is any clear difference between the classpath scaladoc would use vs scalac?

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Apr 14, 2017

Thank you for your pointer and details. No.. I haven't seen such things before. So, maven + hadoop 2.7, maven + hadoop 2.6 and sbt + hadoop 2.7 are fine but sbt + hadoop 2.6 is being failed. Let me try to reproduce this in my local and try to look into this. It sounds tricky..

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Apr 14, 2017

It seems we only build the documentation for sbt. I ran ./build/sbt clean(and also removed all cache) first. And then, I ran a build with

./build/sbt  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly

per

========================================================================
Building Spark
========================================================================
[info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly

and then,

./build/sbt  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive unidoc

per


========================================================================
Building Unidoc API Documentation
========================================================================
[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive unidoc

I also, ran run-tests.py with the identical profiles to resemble the steps. However, I am unable to reproduce this. @srowen, are you able to reproduce this in your local maybe?

@HyukjinKwon
Copy link
Member Author

What I don't get is, it seems the last test against this PR https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75739/console used the same profiles as below:

========================================================================
Building Spark
========================================================================
[info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly
========================================================================
Building Unidoc API Documentation
========================================================================
[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive unidoc

@HyukjinKwon
Copy link
Member Author

@HyukjinKwon
Copy link
Member Author

Is it okay to just explicitly override the dependency to Avro 1.7.7 in sbt?

@srowen
Copy link
Member

srowen commented Apr 14, 2017

The build should use 1.7.7, yes. Hadoop pulls in 1.7.4, but, it does so in 2.6 and 2.7. And the SBT and Maven builds seem to get that right as intended because the POM directly overrides this version. (The only component on a different Avro is the Flume module but that's not the problem here.)

I also can't reproduce this locally. It builds fine for me too with the same commands.

I am open to workarounds, though I also don't know what will be sufficient because we can't reproduce it. I am pretty sure the Avro 1.7.4 dependency is coming from hadoop-common but no idea why only in 2.6.

sbt-unidoc has a newer version, 0.4.0, but updating it requires other changes I don't know how to make and I don't see a reason to think it's the problem.

I wonder if the problem is that core does not directly declare a dependency on org.apache.avro:avro but uses it. If so then adding this might do the trick in the core POM:

      <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro</artifactId>
      </dependency>

asfgit pushed a commit that referenced this pull request Apr 16, 2017
… failure in SBT Hadoop 2.6 master on Jenkins

## What changes were proposed in this pull request?

This PR proposes to add

```
      <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro</artifactId>
      </dependency>
```

in core POM to see if it resolves the build failure as below:

```
[error] /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.6/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala:123: value createDatumWriter is not a member of org.apache.avro.generic.GenericData
[error]     writerCache.getOrElseUpdate(schema, GenericData.get.createDatumWriter(schema))
[error]
```

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/2770/consoleFull

## How was this patch tested?

I tried many ways but I was unable to reproduce this in my local. Sean also tried the way I did but he was also unable to reproduce this.

Please refer the comments in #17477 (comment)

Author: hyukjinkwon <[email protected]>

Closes #17642 from HyukjinKwon/SPARK-20343.
ghost pushed a commit to dbtsai/spark that referenced this pull request Apr 18, 2017
…ailure in SBT Hadoop 2.6 master on Jenkins

## What changes were proposed in this pull request?

This PR proposes to force Avro's version to 1.7.7 in core to resolve the build failure as below:

```
[error] /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.6/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala:123: value createDatumWriter is not a member of org.apache.avro.generic.GenericData
[error]     writerCache.getOrElseUpdate(schema, GenericData.get.createDatumWriter(schema))
[error]
```

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/2770/consoleFull

Note that this is a hack and should be removed in the future.

## How was this patch tested?

I only tested this actually overrides the dependency.

I tried many ways but I was unable to reproduce this in my local. Sean also tried the way I did but he was also unable to reproduce this.

Please refer the comments in apache#17477 (comment)

Author: hyukjinkwon <[email protected]>

Closes apache#17651 from HyukjinKwon/SPARK-20343-sbt.
asfgit pushed a commit that referenced this pull request Apr 19, 2017
…tly set in SBT build

## What changes were proposed in this pull request?

This PR proposes two things as below:

- Avoid Unidoc build only if Hadoop 2.6 is explicitly set in SBT build

  Due to a different dependency resolution in SBT & Unidoc by an unknown reason, the documentation build fails on a specific machine & environment in Jenkins but it was unable to reproduce.

  So, this PR just checks an environment variable `AMPLAB_JENKINS_BUILD_PROFILE` that is set in Hadoop 2.6 SBT build against branches on Jenkins, and then disables Unidoc build. **Note that PR builder will still build it with Hadoop 2.6 & SBT.**

  ```
  ========================================================================
  Building Unidoc API Documentation
  ========================================================================
  [info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive unidoc
  Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
  ...
  ```

  I checked the environment variables from the logs (first bit) as below:

  - **spark-master-test-sbt-hadoop-2.6** (this one is being failed) - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  SPARK_BRANCH=master
  AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.6   <- I use this variable
  AMPLAB_JENKINS="true"
  ```
  - spark-master-test-sbt-hadoop-2.7 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  SPARK_BRANCH=master
  AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.7
  AMPLAB_JENKINS="true"
  ```

  - spark-master-test-maven-hadoop-2.6 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  HADOOP_PROFILE=hadoop-2.6
  HADOOP_VERSION=
  SPARK_BRANCH=master
  AMPLAB_JENKINS="true"
  ```

  - spark-master-test-maven-hadoop-2.7 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  HADOOP_PROFILE=hadoop-2.7
  HADOOP_VERSION=
  SPARK_BRANCH=master
  AMPLAB_JENKINS="true"
  ```

  - PR builder - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75843/consoleFull

  ```
  JENKINS_MASTER_HOSTNAME=amp-jenkins-master
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  ```

  Assuming from other logs in branch-2.1

    - SBT & Hadoop 2.6 against branch-2.1 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.1-test-sbt-hadoop-2.6/lastBuild/consoleFull

      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      SPARK_BRANCH=branch-2.1
      AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.6
      AMPLAB_JENKINS="true"
      ```

    - Maven & Hadoop 2.6 against branch-2.1 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.1-test-maven-hadoop-2.6/lastBuild/consoleFull

      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      HADOOP_PROFILE=hadoop-2.6
      HADOOP_VERSION=
      SPARK_BRANCH=branch-2.1
      AMPLAB_JENKINS="true"
      ```

  We have been using the same convention for those variables. These are actually being used in `run-tests.py` script - here https://github.com/apache/spark/blob/master/dev/run-tests.py#L519-L520

- Revert the previous try

  After #17651, it seems the build still fails on SBT Hadoop 2.6 master.

  I am unable to reproduce this - #17477 (comment) and the reviewer was too. So, this got merged as it looks the only way to verify this is to merge it currently (as no one seems able to reproduce this).

## How was this patch tested?

I only checked `is_hadoop_version_2_6 = os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6"` is working fine as expected as below:

```python
>>> import collections
>>> os = collections.namedtuple('os', 'environ')(environ={"AMPLAB_JENKINS_BUILD_PROFILE": "hadoop2.6"})
>>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
False
>>> os = collections.namedtuple('os', 'environ')(environ={"AMPLAB_JENKINS_BUILD_PROFILE": "hadoop2.7"})
>>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
True
>>> os = collections.namedtuple('os', 'environ')(environ={})
>>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
True
```

I tried many ways but I was unable to reproduce this in my local. Sean also tried the way I did but he was also unable to reproduce this.

Please refer the comments in #17477 (comment)

Author: hyukjinkwon <[email protected]>

Closes #17669 from HyukjinKwon/revert-SPARK-20343.
asfgit pushed a commit that referenced this pull request Apr 19, 2017
…tly set in SBT build

## What changes were proposed in this pull request?

This PR proposes two things as below:

- Avoid Unidoc build only if Hadoop 2.6 is explicitly set in SBT build

  Due to a different dependency resolution in SBT & Unidoc by an unknown reason, the documentation build fails on a specific machine & environment in Jenkins but it was unable to reproduce.

  So, this PR just checks an environment variable `AMPLAB_JENKINS_BUILD_PROFILE` that is set in Hadoop 2.6 SBT build against branches on Jenkins, and then disables Unidoc build. **Note that PR builder will still build it with Hadoop 2.6 & SBT.**

  ```
  ========================================================================
  Building Unidoc API Documentation
  ========================================================================
  [info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive unidoc
  Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
  ...
  ```

  I checked the environment variables from the logs (first bit) as below:

  - **spark-master-test-sbt-hadoop-2.6** (this one is being failed) - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  SPARK_BRANCH=master
  AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.6   <- I use this variable
  AMPLAB_JENKINS="true"
  ```
  - spark-master-test-sbt-hadoop-2.7 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  SPARK_BRANCH=master
  AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.7
  AMPLAB_JENKINS="true"
  ```

  - spark-master-test-maven-hadoop-2.6 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  HADOOP_PROFILE=hadoop-2.6
  HADOOP_VERSION=
  SPARK_BRANCH=master
  AMPLAB_JENKINS="true"
  ```

  - spark-master-test-maven-hadoop-2.7 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/lastBuild/consoleFull

  ```
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  HADOOP_PROFILE=hadoop-2.7
  HADOOP_VERSION=
  SPARK_BRANCH=master
  AMPLAB_JENKINS="true"
  ```

  - PR builder - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75843/consoleFull

  ```
  JENKINS_MASTER_HOSTNAME=amp-jenkins-master
  JAVA_HOME=/usr/java/jdk1.8.0_60
  JAVA_7_HOME=/usr/java/jdk1.7.0_79
  ```

  Assuming from other logs in branch-2.1

    - SBT & Hadoop 2.6 against branch-2.1 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.1-test-sbt-hadoop-2.6/lastBuild/consoleFull

      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      SPARK_BRANCH=branch-2.1
      AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.6
      AMPLAB_JENKINS="true"
      ```

    - Maven & Hadoop 2.6 against branch-2.1 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.1-test-maven-hadoop-2.6/lastBuild/consoleFull

      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      HADOOP_PROFILE=hadoop-2.6
      HADOOP_VERSION=
      SPARK_BRANCH=branch-2.1
      AMPLAB_JENKINS="true"
      ```

  We have been using the same convention for those variables. These are actually being used in `run-tests.py` script - here https://github.com/apache/spark/blob/master/dev/run-tests.py#L519-L520

- Revert the previous try

  After #17651, it seems the build still fails on SBT Hadoop 2.6 master.

  I am unable to reproduce this - #17477 (comment) and the reviewer was too. So, this got merged as it looks the only way to verify this is to merge it currently (as no one seems able to reproduce this).

## How was this patch tested?

I only checked `is_hadoop_version_2_6 = os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6"` is working fine as expected as below:

```python
>>> import collections
>>> os = collections.namedtuple('os', 'environ')(environ={"AMPLAB_JENKINS_BUILD_PROFILE": "hadoop2.6"})
>>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
False
>>> os = collections.namedtuple('os', 'environ')(environ={"AMPLAB_JENKINS_BUILD_PROFILE": "hadoop2.7"})
>>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
True
>>> os = collections.namedtuple('os', 'environ')(environ={})
>>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
True
```

I tried many ways but I was unable to reproduce this in my local. Sean also tried the way I did but he was also unable to reproduce this.

Please refer the comments in #17477 (comment)

Author: hyukjinkwon <[email protected]>

Closes #17669 from HyukjinKwon/revert-SPARK-20343.

(cherry picked from commit 3537876)
Signed-off-by: Sean Owen <[email protected]>
@HyukjinKwon HyukjinKwon deleted the SPARK-18692 branch January 2, 2018 03:38
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
## What changes were proposed in this pull request?

This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.

There are several problems with it:

- It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?".

- > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.

  (see  joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627))

To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.

There fixes are similar instances that previously fixed. Please refer apache#15999 and apache#16013

Note that this only fixes **errors** not **warnings**. Please see my observation apache#17389 (comment) for spurious errors by warnings.

## How was this patch tested?

Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`.

This was tested via manually adding `time.time()` as below:

```diff
     profiles_and_goals = build_profiles + sbt_goals

     print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
           " ".join(profiles_and_goals))

+    import time
+    st = time.time()
     exec_sbt(profiles_and_goals)
+    print("Elapsed :[%s]" % str(time.time() - st))
```

produces

```
...
========================================================================
Building Unidoc API Documentation
========================================================================
...
[info] Main Java API documentation successful.
...
Elapsed :[94.8746569157]
...

Author: hyukjinkwon <[email protected]>

Closes apache#17477 from HyukjinKwon/SPARK-18692.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants