Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34820][K8S][R] add apt-update before gnupg install #31923

Closed
wants to merge 1 commit into from

Conversation

Yikun
Copy link
Member

@Yikun Yikun commented Mar 22, 2021

What changes were proposed in this pull request?

We added the gnupg installation in #30130 , we should do apt update before gnupg isntallation, otherwise we will get a fetch error when package is updated.

See more in:
[1] http://apache-spark-developers-list.1001551.n3.nabble.com/K8s-Integration-test-is-unable-to-run-because-of-the-unavailable-libs-td30986.html

Why are the changes needed?

add a apt-update cmd before gnupg installation to avoid invaild package cache list.

Does this PR introduce any user-facing change?

No

How was this patch tested?

K8s Integration test passed

@Yikun
Copy link
Member Author

Yikun commented Mar 22, 2021

cc @Ngone51

@Ngone51
Copy link
Member

Ngone51 commented Mar 22, 2021

@Yikun thanks for the quick fix. cc @dongjoon-hyun @holdenk

@Ngone51
Copy link
Member

Ngone51 commented Mar 22, 2021

ok to test

@SparkQA
Copy link

SparkQA commented Mar 22, 2021

Test build #136335 has finished for PR 31923 at commit 7418fdb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40920/

@SparkQA
Copy link

SparkQA commented Mar 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40920/

@Yikun
Copy link
Member Author

Yikun commented Mar 22, 2021

Step 6/12 : RUN   apt-get update &&   apt install -y gnupg &&   echo "deb http://cloud.r-project.org/bin/linux/debian buster-cran35/" >> /etc/apt/sources.list &&   (apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' || apt-key adv --keyserver keys.openpgp.org --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF') &&   apt-get update &&   apt install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*
 ---> Running in 890d910fdc12
... ...

 ---> 6fc29805545e
Step 7/12 : COPY R ${SPARK_HOME}/R
 ---> 98781dd121aa
Step 8/12 : ENV R_HOME /usr/lib/R

From [1] (Step 6/12 succeed), we can see the failed fetch problem of SPARK-34820 had been fixed.

[1] https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40920/consoleFull

@Yikun
Copy link
Member Author

Yikun commented Mar 22, 2021

- Run SparkR on simple dataframe.R example *** FAILED ***
  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://192.168.39.219:8443/api/v1/namespaces/5b12a03f6e2c40308f8c12bf4d9ea3e3/pods/spark-test-app-f99d40879bc94e569c8ae32b79a88970/log?pretty=false. Message: container "spark-kubernetes-driver" in pod "spark-test-app-f99d40879bc94e569c8ae32b79a88970" is waiting to start: trying and failing to pull image. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=container "spark-kubernetes-driver" in pod "spark-test-app-f99d40879bc94e569c8ae32b79a88970" is waiting to start: trying and failing to pull image, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=BadRequest, status=Failure, additionalProperties={}).
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:570)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:509)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.doGetLog(PodOperationsImpl.java:189)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:198)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:85)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$3(KubernetesSuite.scala:89)
  at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
  at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
  at org.apache.spark.SparkFunSuite.logInfo(SparkFunSuite.scala:61)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$2(KubernetesSuite.scala:86)
  ...

and there are still some other errors triggerd, looks like unrelated.

@Ngone51
Copy link
Member

Ngone51 commented Mar 22, 2021

retest this please

@Ngone51
Copy link
Member

Ngone51 commented Mar 22, 2021

Let's retry test see if it's flaky

@SparkQA
Copy link

SparkQA commented Mar 22, 2021

Test build #136345 has finished for PR 31923 at commit 7418fdb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40929/

@SparkQA
Copy link

SparkQA commented Mar 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40929/

@Yikun
Copy link
Member Author

Yikun commented Mar 22, 2021

image

The SparkPullRequestBuilder-K8s (Kubernetes integration test) back to green again, I think the PR is ready to merge.

[1] https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/

@shaneknapp shaneknapp self-requested a review March 22, 2021 15:38
@shaneknapp
Copy link
Contributor

thanks for doing this!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.
Merged to master/3.1.
cc @attilapiros

dongjoon-hyun pushed a commit that referenced this pull request Mar 22, 2021
### What changes were proposed in this pull request?
We added the gnupg installation in #30130 , we should do apt update before gnupg isntallation, otherwise we will get a fetch error when package is updated.

See more in:
[1] http://apache-spark-developers-list.1001551.n3.nabble.com/K8s-Integration-test-is-unable-to-run-because-of-the-unavailable-libs-td30986.html

### Why are the changes needed?
add a apt-update cmd before gnupg installation to avoid invaild package cache list.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
K8s Integration test passed

Closes #31923 from Yikun/SPARK-34820.

Authored-by: Yikun Jiang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 31da907)
Signed-off-by: Dongjoon Hyun <[email protected]>
@Ngone51
Copy link
Member

Ngone51 commented Mar 25, 2021

Hi all, I'd like to get your continued attention on the K8s integration test issue. After this PR fixed the gnupg installation issue, another issue shows up, which is almost a constant failure:

- Run SparkR on simple dataframe.R example *** FAILED ***
  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://192.168.39.219:8443/api/v1/namespaces/b80cb1250aba4d92a99fc87b609b2328/pods/spark-test-app-411d038edc8b4e9b8e3761052cd44bd8/log?pretty=false. Message: container "spark-kubernetes-driver" in pod "spark-test-app-411d038edc8b4e9b8e3761052cd44bd8" is waiting to start: trying and failing to pull image. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=container "spark-kubernetes-driver" in pod "spark-test-app-411d038edc8b4e9b8e3761052cd44bd8" is waiting to start: trying and failing to pull image, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=BadRequest, status=Failure, additionalProperties={}).
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:570)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:509)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.doGetLog(PodOperationsImpl.java:189)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:198)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:85)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$3(KubernetesSuite.scala:89)
  at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
  at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
  at org.apache.spark.SparkFunSuite.logInfo(SparkFunSuite.scala:61)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$2(KubernetesSuite.scala:86)

And here is PR that has tried K8s integration test multiple times but all failed.
Could someone please help take a look? Thanks!

@attilapiros
Copy link
Contributor

@Ngone51 I checked the PR you mentioned.

My findings based on the last failure.

Here the first error is:

- Launcher client dependencies
- SPARK-33615: Launcher client archives *** FAILED ***
  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://192.168.39.147:8443/api/v1/namespaces/09e9c94160d543c1a338f364722d49a6/pods/spark-test-app-c80c78f512574b36b2608f7d92c24503/log?pretty=false. Message: container "spark-kubernetes-driver" in pod "spark-test-app-c80c78f512574b36b2608f7d92c24503" is waiting to start: trying and failing to pull image. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=container "spark-kubernetes-driver" in pod "spark-test-app-c80c78f512574b36b2608f7d92c24503" is waiting to start: trying and failing to pull image, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=BadRequest, status=Failure, additionalProperties={}).
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:570)
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:509)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.doGetLog(PodOperationsImpl.java:189)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:198)
  at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:85)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$3(KubernetesSuite.scala:89)
  at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
  at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
  at org.apache.spark.SparkFunSuite.logInfo(SparkFunSuite.scala:61)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$logForFailedTest$2(KubernetesSuite.scala:86)
  ...

Checking the last successful and first failed ones we can see just a very few differences in the code: the failed one uses a --archives and the successful uses --files.

I do not think this difference could lead to an error such "is waiting to start: trying and failing to pull image."

@attilapiros
Copy link
Contributor

It would be wonderful to see kubectl describe pod <pod> output.
Actually I think I can do this in one of my PR...

@attilapiros
Copy link
Contributor

Of course locally all the tests passed:

All tests passed.
...
[INFO] Spark Project Kubernetes Integration Tests ......... SUCCESS [25:12 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------

@attilapiros
Copy link
Contributor

I hope this will help to troubleshoot this and similar errors:
#31962

@Ngone51
Copy link
Member

Ngone51 commented Mar 29, 2021

Thanks for the effort @attilapiros

@shaneknapp
Copy link
Contributor

shaneknapp commented Mar 29, 2021 via email

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
### What changes were proposed in this pull request?
We added the gnupg installation in apache#30130 , we should do apt update before gnupg isntallation, otherwise we will get a fetch error when package is updated.

See more in:
[1] http://apache-spark-developers-list.1001551.n3.nabble.com/K8s-Integration-test-is-unable-to-run-because-of-the-unavailable-libs-td30986.html

### Why are the changes needed?
add a apt-update cmd before gnupg installation to avoid invaild package cache list.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
K8s Integration test passed

Closes apache#31923 from Yikun/SPARK-34820.

Authored-by: Yikun Jiang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 31da907)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants