Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFS CSI driver should pass the IAM flag (for IRSA) to EFS Utils for app-specific filesystem policies #280

Closed
esalberg opened this issue Nov 13, 2020 · 31 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@esalberg
Copy link

Is your feature request related to a problem?/Why is this needed
We are unable to segregate EFS filesystems based on policy to different EKS pods. While we can pass a specific role to the EFS CSI driver itself, it is always the same role.

We need to limit EFS filesystem access per pod.

/feature

Describe the solution you'd like in detail
EFS Utils has added support for AssumeRoleWithWebIdentity for IRSA:
aws/efs-utils#60

However, the EFS CSI driver does not appear to pass the IAM flag to EFS Utils.

"[AWS Support] was able to replicate your finding of not being able to use IRSA (IAM Roles for Service Accounts) with EFS Filesystem Policies and Access Points. My testing environment involved setting up a fresh EKS cluster, with IRSA provisioned, and an EFS instance with two access points. The filesystem policy was created so that two unique IAM roles were allowed to access one of the access points, and two pods were spun up in the cluster with a role associated to each. Without filesystem policy in place, the driver worked as expected and was able to mount the access points required. However, when the filesystem policy was attached, the efs-csi-node driver was unable to complete the volume mount.

The issue seems to stem from the EFS CSI driver not passing the proper IAM flag to the underlying efs-utils package that the driver utilizes. This has been reported on the CSI driver repository, as well as the AWS container roadmap as seen here and here. EFS Utils requires passing the flag -o iam to the mount command to properly pass IAM roles for authorization. As the EFS CSI driver is solely responsible for fulling Kubernetes Persistent Volume requests, it does not currently read Kubernetes Service Account information from pods to facilitate authorization.

You can see documentation for the IAM flag in the announcement blog post for EFS Access Points here. An example working mount command would be as follows:
$ sudo mount -t efs -o iam,tls,accesspoint=fsap- fs- /mnt/shared

From my testing, looking at the logs of the EFS CSI node driver, you can see the mount command being passed here, without the required IAM flag:
Mounting arguments: -t efs -o accesspoint=fsap-,tls fs-:/ /var/lib/kubelet.pods/dde3c956-2d4e-4e80-b656-dcb282e6289c/volumes/kubernetes.io~csi/efs-pv-2/mount

Output: Could not start amazon-efs-mount-watchdog, unrecognized init system “aws-efs-csi-dri”

mount.nfs4: access denied by server while mounting 127.0.0.1:/

Since the flag is not passed, the driver does not try to read any IAM role, and is therefore unable to assume the role required to mount the access point with policy attached. The relevant code that is called inside the driver is located here, where the flag is not passed:

mountOptions = append(mountOptions, fmt.Sprintf("accesspoint=%s", apid), "tls")

Describe alternatives you've considered
We are segregating apps into separate EKS clusters if they need EFS access.

Additional context
Container roadmap link:
aws/containers-roadmap#1003

@wongma7
Copy link
Contributor

wongma7 commented Nov 13, 2020

We have to wait for this feature kubernetes/enhancements#2047 . Otherwise there is no way for the driver to get a pod's creds and use the pod's creds for mounting.

@garystafford
Copy link

We have to wait for this feature kubernetes/enhancements#2047 . Otherwise there is no way for the driver to get a pod's creds and use the pod's creds for mounting.

It appears that this dependent feature entered Beta two weeks ago, for possible realease with v1.21. Given the status change, is there any timeline for adding the above feature to the driver?

@bssantiago
Copy link

Do we have updates on this issue ?
there's a workaround for getting the EFS filesystem access per pod ?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 13, 2021
@sbreingan
Copy link

Bumping this issue - are there any updates on implementing it in the roadmap?

@esalberg
Copy link
Author

FYI - It looks like the Kubernetes upstream requirement finally is being merged for k8s 1.22. Not sure when AWS will get 1.22, but hopefully then we can get a date for the efs csi driver update?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 21, 2021
@esalberg
Copy link
Author

esalberg commented Aug 23, 2021 via email

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 23, 2021
@vchepkov
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 23, 2021
@ghost
Copy link

ghost commented Nov 11, 2021

I feel like adding -o iam would be useful even without CSIServiceAccountToken.

Right now, according to CloudTrail, the file system is being mounted as ANONYMOUS_PRINCIPAL:

{
    "userIdentity": {
        "type": "AWSAccount",
        "principalId": "",
        "accountId": "ANONYMOUS_PRINCIPAL"
    },
    "eventTime": "2021-11-11T16:41:04Z",
    "eventSource": "elasticfilesystem.amazonaws.com",
    "eventName": "NewClientConnection",
    ...
}

If -o iam was included, I believe we would arrive as the role associated with the efs-csi-node-sa service account, and could write a policy preventing the EFS from being mounted except via the CSI.

Would a PR to do this be accepted?

@sedan07
Copy link

sedan07 commented Dec 23, 2021

To have the node daemonset use its IAM role when mounting the file system you can add iam to the mountOptions of the StorageClass or PersistentVolume:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
mountOptions:
  - tls
  - iam
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-012345678901010
  directoryPerms: "700"
  gidRangeStart: "1000"
  gidRangeEnd: "2000"
  basePath: "/dynamic_provisioning"

This then gets added to the mount command.

However this will use the same role for all pods on the cluster. As mentioned it would be really nice if you could inject in a different IAM role for each PV/PVC.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 23, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 22, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lmouhib
Copy link
Contributor

lmouhib commented May 22, 2022

/reopen

@k8s-ci-robot
Copy link
Contributor

@lmouhib: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vchepkov
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 23, 2022
@vchepkov
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@vchepkov: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vchepkov
Copy link

Could anybody from maintainers re-open this ticket, please?

@wongma7 wongma7 reopened this Jun 14, 2022
@jonathanrainer
Copy link
Contributor

#710 This PR should resolve these problems I think

@adeweetman-al
Copy link

We are interested in this - any updates please?

@nazarewk
Copy link

nazarewk commented Jan 4, 2023

I have resolved the issue with mount.nfs4: access denied by server while mounting 127.0.0.1:/ by adding iam to mountOptions on a StorageClass.

I found #422 which got me the idea to check the option for mount-time options and there is no occurrence of iam in

func (d *Driver) NodePublishVolume(ctx context.Context, req *csi.NodePublishVolumeRequest) (*csi.NodePublishVolumeResponse, error) {
klog.V(4).Infof("NodePublishVolume: called with args %+v", req)
mountOptions := []string{}
target := req.GetTargetPath()
if len(target) == 0 {
return nil, status.Error(codes.InvalidArgument, "Target path not provided")
}
volCap := req.GetVolumeCapability()
if volCap == nil {
return nil, status.Error(codes.InvalidArgument, "Volume capability not provided")
}
if err := d.isValidVolumeCapabilities([]*csi.VolumeCapability{volCap}); err != nil {
return nil, status.Error(codes.InvalidArgument, fmt.Sprintf("Volume capability not supported: %s", err))
}
if volCap.GetMount() == nil {
return nil, status.Error(codes.InvalidArgument, "Volume capability access type must be mount")
}
// TODO when CreateVolume is implemented, it must use the same key names
subpath := "/"
encryptInTransit := true
volContext := req.GetVolumeContext()
for k, v := range volContext {
switch strings.ToLower(k) {
//Deprecated
case "path":
klog.Warning("Use of path under volumeAttributes is deprecated. This field will be removed in future release")
if !filepath.IsAbs(v) {
return nil, status.Errorf(codes.InvalidArgument, "Volume context property %q must be an absolute path", k)
}
subpath = filepath.Join(subpath, v)
case "storage.kubernetes.io/csiprovisioneridentity":
continue
case "encryptintransit":
var err error
encryptInTransit, err = strconv.ParseBool(v)
if err != nil {
return nil, status.Error(codes.InvalidArgument, fmt.Sprintf("Volume context property %q must be a boolean value: %v", k, err))
}
case MountTargetIp:
ipAddr := volContext[MountTargetIp]
mountOptions = append(mountOptions, MountTargetIp+"="+ipAddr)
default:
return nil, status.Errorf(codes.InvalidArgument, "Volume context property %s not supported", k)
}
}
fsid, vpath, apid, err := parseVolumeId(req.GetVolumeId())
if err != nil {
// parseVolumeId returns the appropriate error
return nil, err
}
// The `vpath` takes precedence if specified. If not specified, we'll either use the
// (deprecated) `path` from the volContext, or default to "/" from above.
if vpath != "" {
subpath = vpath
}
source := fmt.Sprintf("%s:%s", fsid, subpath)
// If an access point was specified, we need to include two things in the mountOptions:
// - The access point ID, properly prefixed. (Below, we'll check whether an access point was
// also specified in the incoming mount options and react appropriately.)
// - The TLS option. Access point mounts won't work without it. (For ease of use, we won't
// require this to be present in the mountOptions already, but we won't complain if it is.)
if apid != "" {
mountOptions = append(mountOptions, fmt.Sprintf("accesspoint=%s", apid), "tls")
}
if encryptInTransit {
// The TLS option may have been added above if apid was set
// TODO: mountOptions should be a set to avoid all this hasOption checking
if !hasOption(mountOptions, "tls") {
mountOptions = append(mountOptions, "tls")
}
}
if req.GetReadonly() {
mountOptions = append(mountOptions, "ro")
}
if m := volCap.GetMount(); m != nil {
for _, f := range m.MountFlags {
// Special-case check for access point
// Not sure if `accesspoint` is allowed to have mixed case, but this shouldn't hurt,
// and it simplifies both matches (HasPrefix, hasOption) below.
f = strings.ToLower(f)
if strings.HasPrefix(f, "accesspoint=") {
// The MountOptions Access Point ID
moapid := f[12:]
// No matter what, warn that this is not the right way to specify an access point
klog.Warning(fmt.Sprintf(
"Use of 'accesspoint' under mountOptions is deprecated with this driver. "+
"Specify the access point in the volumeHandle instead, e.g. 'volumeHandle: %s:%s:%s'",
fsid, subpath, moapid))
// If they specified the same access point in both places, let it slide; otherwise, fail.
if apid != "" && moapid != apid {
return nil, status.Errorf(codes.InvalidArgument,
"Found conflicting access point IDs in mountOptions (%s) and volumeHandle (%s)", moapid, apid)
}
// Fall through; the code below will uniq for us.
}
if f == "tls" {
klog.Warning(
"Use of 'tls' under mountOptions is deprecated with this driver since tls is enabled by default. " +
"To disable it, set encrypt in transit in the volumeContext, e.g. 'encryptInTransit: true'")
// If they set tls and encryptInTransit is true, let it slide; otherwise, fail.
if !encryptInTransit {
return nil, status.Errorf(codes.InvalidArgument,
"Found tls in mountOptions but encryptInTransit is false")
}
}
if strings.HasPrefix(f, "awscredsuri") {
klog.Warning("awscredsuri mount option is not supported by efs-csi-driver.")
continue
}
if !hasOption(mountOptions, f) {
mountOptions = append(mountOptions, f)
}
}
}
klog.V(5).Infof("NodePublishVolume: creating dir %s", target)
if err := d.mounter.MakeDir(target); err != nil {
return nil, status.Errorf(codes.Internal, "Could not create dir %q: %v", target, err)
}
klog.V(5).Infof("NodePublishVolume: mounting %s at %s with options %v", source, target, mountOptions)
if err := d.mounter.Mount(source, target, "efs", mountOptions); err != nil {
os.Remove(target)
return nil, status.Errorf(codes.Internal, "Could not mount %q at %q: %v", source, target, err)
}
klog.V(5).Infof("NodePublishVolume: %s was mounted", target)
//Increment volume Id counter
if d.volMetricsOptIn {
if value, ok := volumeIdCounter[req.GetVolumeId()]; ok {
volumeIdCounter[req.GetVolumeId()] = value + 1
} else {
volumeIdCounter[req.GetVolumeId()] = 1
}
}
return &csi.NodePublishVolumeResponse{}, nil
}

Any idea whether it is intentional?

Before I had that I needed a policy allowing anything on AWS to mount the EFS:

{
    "Sid": "CSIMountAccess",
    "Effect": "Allow",
    "Principal": {
        "AWS": "*"
    },
    "Action": [
        "elasticfilesystem:ClientWrite",
        "elasticfilesystem:ClientMount"
    ],
    "Resource": "arn:aws:elasticfilesystem:eu-north-1:XXXXXX:file-system/fs-0f0b08aba4527c97e",
    "Condition": {
        "Bool": {
            "elasticfilesystem:AccessedViaMountTarget": "true"
        }
    }
}

after adding iam mountOption it started working properly:

{
    "Sid": "CSIMountAccess",
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::XXXXXX:role/infra-test-12345-efs-csi-node"
    },
    "Action": [
        "elasticfilesystem:ClientWrite",
        "elasticfilesystem:ClientMount"
    ],
    "Resource": "arn:aws:elasticfilesystem:eu-north-1:XXXXXX:file-system/fs-0f0b08aba4527c97e",
    "Condition": {
        "Bool": {
            "elasticfilesystem:AccessedViaMountTarget": "true"
        }
    }
}

@Bruce-Lu674
Copy link

Bruce-Lu674 commented Mar 23, 2023

@nazarewk Hi,
"arn:aws:iam::XXXXXX:role/infra-test-12345-efs-csi-node" is your serviceaccount binding role or nodergroup role ?

@nazarewk
Copy link

@nazarewk Hi, "arn:aws:iam::XXXXXX:role/infra-test-12345-efs-csi-node" is your serviceaccount binding role or nodergroup role ?

it's a service account, shouldn't matter though

@RyanStan
Copy link
Contributor

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 15, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests