Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Gcs remote data #121

Merged
merged 18 commits into from
Sep 3, 2020
Merged

Gcs remote data #121

merged 18 commits into from
Sep 3, 2020

Conversation

honnix
Copy link
Member

@honnix honnix commented Sep 1, 2020

TL;DR

Add support of signed GCS URL.
This is a continuation of #81 because CI doesn't seem to work in forked repo.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

Right now only AWS/S3 presigned URL is support. This PR adds support for GCS URL.

The signing part looks a bit funky, mainly due to GCP lib does not support signing URL without
materialised service account key. So this PR uses https://cloud.google.com/iam/docs/reference/credentials/rest/v1/projects.serviceAccounts/signBlob directly.

SigningPrincipal refers to the service account that signs the URL, as well as reads object size. Service account associated with flyteadmin instance needs to have the role on the principal.

Tracking Issue

Follow-up issue

NA

Although this doesn't seem to be required, it is better to do things according
to API doc.
This will make sure only the `signingPrincipal` needs readonly permission on the
object instead of the service account associated with flyteadmin.
This is because of introducing of google.golang.org/protobuf
@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2020

Codecov Report

Merging #121 into master will decrease coverage by 0.15%.
The diff coverage is 52.43%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #121      +/-   ##
==========================================
- Coverage   62.39%   62.24%   -0.16%     
==========================================
  Files         104      105       +1     
  Lines        7752     7840      +88     
==========================================
+ Hits         4837     4880      +43     
- Misses       2345     2385      +40     
- Partials      570      575       +5     
Flag Coverage Δ
#unittests 62.24% <52.43%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/data/implementations/gcp_remote_url.go 52.43% <52.43%> (ø)
pkg/repositories/config/migrations.go 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf159e5...02c9eae. Read the comment docs.

@@ -46,8 +46,8 @@ var compiledTaskDigest = []byte{
0xa, 0x22, 0x80, 0xb1, 0x8, 0x44, 0x53, 0xf3, 0xca, 0x60, 0x4, 0xf7, 0x6f}

var compiledWorkflowDigest = []byte{
0x1c, 0x66, 0x45, 0x89, 0x8f, 0x5, 0xa5, 0x4f, 0xf5, 0xba, 0x6f, 0xd9, 0xd, 0x70, 0xf6, 0x86, 0xf, 0x8e, 0x7e, 0x1b,
0x80, 0x1d, 0xb9, 0x59, 0xe, 0x3, 0x50, 0x4d, 0x64, 0xeb, 0x13, 0xa2}
0xeb, 0x66, 0x44, 0xe8, 0x1c, 0xa8, 0x51, 0x7d, 0x3f, 0x33, 0xf0, 0x77, 0x95, 0x24, 0x84, 0xc2, 0xbe, 0x79, 0xcd,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to introducing of google.golang.org/protobuf as transitive dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, @katrogan will this affect all existing digests to be invalidated, should be fine, but we might cause some re-registrations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I most concerned about as well. I can dig a bit further to see whether this can be avoided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting! this could be a problem because some folks will get a warning about the digest changing for a workflow with the same identifier, so they will have to change the version parameter to reregister - which is unfortunate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to revert this change by downgrading a few deps, in latest commit. Still we will need to face this thing later: #121 (comment)

@@ -45,6 +46,12 @@ func GetRemoteDataHandler(cfg RemoteDataHandlerConfig) RemoteDataHandler {
return &remoteDataHandler{
remoteURL: implementations.NewAWSRemoteURL(awsConfig, presignedURLDuration),
}
case common.GCP:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katrogan / @honnix with the new change to return the entire data as part of the getData API, we should probably think of making signing optional (we have to keep it around to ensure that very large datasets can still be served). By optional I mean, that if it is not specified, the signing should just be skipped.

I do not think my comment needs to be added as part of this commit, but as a follow up? @katrogan what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that was what i had in the original idl pr - adding a mode to specify what data we want returned and making only the signed url a non-default option. returning the signed url data optionally sounds good to me

kumare3
kumare3 previously approved these changes Sep 2, 2020
Copy link
Contributor

@kumare3 kumare3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, but LGTM

codes.Internal, "failed to get object size for %s with %v", uri, err)
}

// The second return argument here is the GetObjectOutput, which we don't use below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this comment relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry. Copy pasta. Will fix.

}

type gcsClientWrapper struct {
delegate *gcs.Client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @honnix I'm not super familiar with the delegate pattern, what's the reason for using it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this so you can mock out the bucket like you commented below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is the only way I could figure out how to test this fluid API.

SignBytes: func(b []byte) ([]byte, error) {
req := &credentialspb.SignBlobRequest{
Payload: b,
Name: "projects/-/serviceAccounts/" + g.signingPrincipal,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the meaning of this string prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a GCP resource name: https://cloud.google.com/iam/docs/reference/rest/v1/projects.serviceAccounts/signBlob

Basically, it means the service account in some GCP project (-) that doesn't matter.

@@ -3,6 +3,8 @@ module github.com/lyft/flyteadmin
go 1.13

require (
cloud.google.com/go v0.56.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the last version staying on protobuf v1.3.5. Since v1.4.0, github.com/golang/protobuf started to depend on google.golang.org/protobuf: https://github.com/golang/protobuf/blob/v1.4.0/ptypes/timestamp/timestamp.pb.go#L9

So sooner or later we will need to face this issue again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is awesome, thank you for finding it

@honnix honnix merged commit 6e2c136 into master Sep 3, 2020
@honnix honnix deleted the gcs-remote-data branch September 3, 2020 10:38
schottra added a commit that referenced this pull request Sep 8, 2020
* master:
  Gcs remote data (#121)
  Do not depend on GOPATH to locate test data (#122)
  Add index to optimize for list task executions for node execution (#120)
  Grpc health checking (#118)
  Allow random cluster selection when no override (#117)
eapolinario pushed a commit that referenced this pull request Sep 6, 2023
* signed GCS URL

Impersonate the signingPrincipal to get object attributes and sign the GCS URL

This will make sure only the `signingPrincipal` needs readonly permission on the
object instead of the service account associated with flyteadmin.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants