Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS artifact loading fails if key is a directory #1351

Closed
janmasarik opened this issue May 2, 2019 · 8 comments · Fixed by argoproj/pkg#6
Closed

GCS artifact loading fails if key is a directory #1351

janmasarik opened this issue May 2, 2019 · 8 comments · Fixed by argoproj/pkg#6

Comments

@janmasarik
Copy link

BUG REPORT

What happened:
After switching from minio to GCS, workflows which tried to load "directory" from GCS as a artifact started to fail with message failed to load artifacts: The specified key does not exist..

What you expected to happen:

To be working the same way as minio does.

How to reproduce it (as minimally and precisely as possible):

Try to run this step on minio and then on GCS. Notice that the key is pointing to the "directory", not a specific file.

Possible cause:
I believe this is caused by the fact that minio returns empty object when you try to access directory object.

GCS returns 404 => The specified key does not exist.

- name: example
    inputs:
    artifacts:
        - name: directory
        path: /directory/
        s3:
            endpoint: storage.googleapis.com
            bucket: bucketname
            key: directory
            accessKeySecret:
            name: gcs-secrets
            key: accessKey
            secretKeySecret:
            name: gcs-secrets
            key: secretKey

Anything else we need to know?:

Environment:

argo: v2.2.1
  BuildDate: 2018-10-11T16:25:59Z
  GitCommit: 3b52b26190163d1f72f3aef1a39f9f291378dafb
  GitTreeState: clean
  GitTag: v2.2.1
  GoVersion: go1.10.3
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
$ kubectl version -o yaml
clientVersion:
  buildDate: 2018-08-20T10:09:03Z
  compiler: gc
  gitCommit: 0c38c362511b20a098d7cd855f1314dad92c2780
  gitTreeState: clean
  gitVersion: v1.10.7
  goVersion: go1.9.3
  major: "1"
  minor: "10"
  platform: darwin/amd64
serverVersion:
  buildDate: 2019-04-04T03:12:09Z
  compiler: gc
  gitCommit: b80664a77d3bce5b4701bc881d972b1a702290bf
  gitTreeState: clean
  gitVersion: v1.12.7-gke.7
  goVersion: go1.10.8b4
  major: "1"
  minor: 12+
  platform: linux/amd64
@Ark-kun
Copy link
Member

Ark-kun commented May 8, 2019

BTW, are there any files in that directory? What happens if you remove the training slash from the path?

@kalugny
Copy link

kalugny commented Jun 25, 2019

I also had this bug.
Here's a reproduction example:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: arguments-artifacts-
spec:
  entrypoint: kubectl-input-artifact
  arguments:
    artifacts:
    - name: kubectl
      s3:
        endpoint: storage.googleapis.com
        bucket: gcp-public-data-landsat
        key: LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2
        accessKeySecret:
          name: argo-cred
          key: accesskey
        secretKeySecret:
          name: argo-cred
          key: secretkey

  templates:
  - name: kubectl-input-artifact
    inputs:
      artifacts:
      - name: kubectl
        path: /usr/local/bin/kubectl
    container:
      image: debian:9.4
      command: [sh, -c]
      args: ["ls -l {{inputs.artifacts.kubectl.path}}"]

I get a different error:
Failed to test if gcp-public-data-landsat is a directory: A header or query you provided requested a function that is not implemented.

And I'm using argo 2.3.0

Logs for failures (from stackdriver):

time="2019-06-25T07:08:34Z"   level=fatal msg="timed out waiting for the condition"
time="2019-06-25T07:08:34Z"   level=info msg="Alloc=5569 TotalAlloc=23150 Sys=70846 NumGC=8   Goroutines=7"
time="2019-06-25T07:08:34Z"   level=error msg="executor error: timed out waiting for the   condition"
time="2019-06-25T07:08:34Z"   level=warning msg="Failed to test if gcp-public-data-landsat is a   directory: A header or query you provided requested a function that is not   implemented."
time="2019-06-25T07:08:34Z"   level=info msg="Getting from s3 (endpoint: storage.googleapis.com,   bucket: gcp-public-data-landsat, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2) to   /argo/inputs/artifacts/kubectl.tmp"
time="2019-06-25T07:08:34Z"   level=info msg="Creating minio client storage.googleapis.com using   static credentials"
time="2019-06-25T07:08:34Z"   level=info msg="S3 Load path: /argo/inputs/artifacts/kubectl.tmp, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2"
time="2019-06-25T07:08:18Z"   level=warning msg="Failed to test if gcp-public-data-landsat is a   directory: A header or query you provided requested a function that is not   implemented."
time="2019-06-25T07:08:17Z"   level=info msg="Getting from s3 (endpoint: storage.googleapis.com,   bucket: gcp-public-data-landsat, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2) to   /argo/inputs/artifacts/kubectl.tmp"
time="2019-06-25T07:08:17Z"   level=info msg="Creating minio client storage.googleapis.com using   static credentials"
time="2019-06-25T07:08:17Z"   level=info msg="S3 Load path: /argo/inputs/artifacts/kubectl.tmp, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2"
time="2019-06-25T07:08:09Z"   level=warning msg="Failed to test if gcp-public-data-landsat is a   directory: A header or query you provided requested a function that is not   implemented."
time="2019-06-25T07:08:08Z"   level=info msg="Getting from s3 (endpoint: storage.googleapis.com,   bucket: gcp-public-data-landsat, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2) to   /argo/inputs/artifacts/kubectl.tmp"
time="2019-06-25T07:08:08Z"   level=info msg="Creating minio client storage.googleapis.com using   static credentials"
time="2019-06-25T07:08:08Z"   level=info msg="S3 Load path: /argo/inputs/artifacts/kubectl.tmp, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2"
time="2019-06-25T07:08:04Z"   level=warning msg="Failed to test if gcp-public-data-landsat is a   directory: A header or query you provided requested a function that is not   implemented."
time="2019-06-25T07:08:03Z"   level=info msg="Getting from s3 (endpoint: storage.googleapis.com,   bucket: gcp-public-data-landsat, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2) to   /argo/inputs/artifacts/kubectl.tmp"
time="2019-06-25T07:08:03Z"   level=info msg="Creating minio client storage.googleapis.com using   static credentials"
time="2019-06-25T07:08:03Z"   level=info msg="S3 Load path: /argo/inputs/artifacts/kubectl.tmp, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2"
time="2019-06-25T07:08:01Z"   level=warning msg="Failed to test if gcp-public-data-landsat is a   directory: A header or query you provided requested a function that is not   implemented."
time="2019-06-25T07:08:01Z"   level=info msg="Getting from s3 (endpoint: storage.googleapis.com,   bucket: gcp-public-data-landsat, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2) to   /argo/inputs/artifacts/kubectl.tmp"
time="2019-06-25T07:08:01Z"   level=info msg="Creating minio client storage.googleapis.com using   static credentials"
time="2019-06-25T07:08:01Z"   level=info msg="S3 Load path: /argo/inputs/artifacts/kubectl.tmp, key:   LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2"
time="2019-06-25T07:08:01Z"   level=info msg="Downloading artifact: kubectl"
time="2019-06-25T07:08:01Z"   level=info msg="Start loading input artifacts..."
time="2019-06-25T07:08:01Z"   level=info msg="Executor (version: v2.3.0, build_date:   2019-05-20T22:10:54Z) initialized (pod: validator/arguments-artifacts-9xht2)   with template:\n{\"name\":\"kubectl-input-artifact\",\"inputs\":{\"artifacts\":[{\"name\":\"kubectl\",\"path\":\"/usr/local/bin/kubectl\",\"s3\":{\"endpoint\":\"storage.googleapis.com\",\"bucket\":\"gcp-public-data-landsat\",\"accessKeySecret\":{\"name\":\"argo-cred\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"argo-cred\",\"key\":\"secretkey\"},\"key\":\"LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2\"}}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"debian:9.4\",\"command\":[\"sh\",\"-c\"],\"args\":[\"ls   -l /usr/local/bin/kubectl\"],\"resources\":{}}}"
time="2019-06-25T07:08:01Z"   level=info msg="Creating a docker executor"

@bakennedy
Copy link

This error is from s3.IsDirectory in it's call to minioClient.ListObjectsV2
https://github.com/argoproj/pkg/blob/38dba6e98495680ff1f8225642b63db10a96bb06/s3/s3.go#L167

ListObjectsV2 is not supported by GCS interop API
Here's a short Go program to reproduce that error:
https://gist.github.com/bakennedy/7dc26af9e1c76d8940221786e1c3c7a9

minioClient.ListObjects is supported by GCS, and it returns the same type. Would this work as a replacement in pkg/s3.IsDirectory?

bakennedy added a commit to bakennedy/pkg that referenced this issue Jul 27, 2019
minioClient.ListObjectsV2 is not supported by GCS, causing the bug in
argoproj/argo-workflows#1351
@Downchuck
Copy link

@bakennedy - it seems like it's expecting some kind of continuation token -- minio/minio@3ec4738

@Downchuck
Copy link

Meantime workaround - grab all the artifacts from the workflow and decompress them into a temporary folder, then repackage it back out as a single artifact:

  - name: reduce
    activeDeadlineSeconds: 60
    retryStrategy:
      limit: 2
    outputs:
      artifacts:
      - name: reduce
        path: /tmp/reduce
    script:
      volumeMounts:
      - name: gcs-key
        mountPath: /var/secrets/google
      image: google/cloud-sdk:alpine
      command: ["/bin/bash"]
      source: |
        mkdir -p /tmp/reduce
        mkdir -p "/tmp/{{workflow.name}}"
        gsutil -q -m -o Credentials:gs_service_key_file=/var/secrets/google/key.json \
             -o GSUtil:parallel_thread_count=8 \
             cp -r "gs://{{workflow.parameters.bucket-name}}/{{workflow.name}}" "/tmp/{{workflow.name}}"
        find "/tmp/{{workflow.name}}" -name '*.tgz' -exec tar -C /tmp/reduce -zxf '{}' \;

  - name: echo
    inputs:
      artifacts:
      - name: reduce
        path: /tmp/reduce
    script:
      image: alpine:3.7
      command: [find, /tmp]

jessesuen pushed a commit to argoproj/pkg that referenced this issue Aug 2, 2019
minioClient.ListObjectsV2 is not supported by GCS, causing the bug in
argoproj/argo-workflows#1351
@jessesuen
Copy link
Member

Closed inadvertently via merge of argoproj/pkg#6. This needs to remain open until dependencies are updated

@jessesuen jessesuen reopened this Aug 2, 2019
@bakennedy
Copy link

@Downchuck thank you for the work around suggestion, and @jessesuen thanks for an expeditious merge! We're doing genre sequencer research, and using a lot of huge files. This will really help us out.

@stale
Copy link

stale bot commented Jul 12, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants