Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: aws_lambda_function.replace_security_groups_on_destroy behaviour is no longer supported #31520

Closed
ascopes opened this issue May 22, 2023 · 8 comments · Fixed by #31904
Closed
Assignees
Labels
bug Addresses a defect in current functionality. service/lambda Issues and PRs that pertain to the lambda service. service/vpc Issues and PRs that pertain to the vpc service.
Milestone

Comments

@ascopes
Copy link

ascopes commented May 22, 2023

Description

A while back, a workaround was added to aws_lambda_function to try and speed up the destroy times for Lambda functions attached to a VPC with security groups. This effectively swapped the Hyperplane ENI security groups out with the VPC default security group to remove the ENI dependency on the initial security group, allowing builds to destroy much faster.

resource "aws_security_group" "security_group" { ... }

resource "aws_lambda_function" "lambda" {
  replace_security_groups_on_destroy = true
  
  ...

  vpc_config {
    ...
    security_group_ids = toset([aws_security_group.security_group.id])
  }
}

A couple of weeks ago, I noticed that in our cloud (eu-west-1), we were getting "Client.OperationNotPermitted" when Terraform attempted to do this.

I contacted AWS Support via my employer's enterprise support plan, and produced a minimal working example for them. They went to speak to one of the internal teams that develops AWS Lambda and provided a technical explanation for us. Effectively, it is no longer possible to update security groups on ENIs once created. I have no idea if this is rolled out globally yet or not, but we have managed to replicate this on three of our own AWS accounts in eu-west-1.

The official AWS response was as follows:

For the issue “Client.OperationNotPermitted” error while attempting to update the Hyperplane ENI (Elastic Network Interface) created by Lambda. This occurred because you had enabled the replace_security_groups_on_destroy option when using the “destroy” command in Terraform. This option automatically updates the security groups associated with Lambda-created Hyperplane ENIs. However, we recently rolled out a change to prevent the mutation of security groups associated with Lambda-created Hyperplane ENIs because it causes mismatch between the security configuration of the Lambda function and the Hyperplane ENI which was created to serve traffic for it. This can lead to to unintended consequences such as the inability to find functions using a Hyperplane ENI, which makes it difficult to delete other dependent resources, such as VPCs.

Their advice on working around this:

You can disable the replace_security_groups_on_destroy option to prevent this issue from reoccurring. We will update the Lambda Developer Guide documentation to capture this change. Additionally, if you want to change the security configuration for a Lambda function, we recommend you to do so by altering the security groups of the function directly so that the security rules of the interface are consistent with that of the function accessing resources through it.

This is unfortunate as this feature reduced our destroy times by over an hour previously. However, it is probably worth marking this attribute as deprecated in the documentation with an explanation to prevent production impairment for other users.

References

AWS Enterprise Support Plan official support.

This was communicated from their internal team that develops AWS Lambda at AWS.

Would you like to implement a fix?

No

@ascopes ascopes added the needs-triage Waiting for first response or review from a maintainer. label May 22, 2023
@github-actions
Copy link

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@github-actions github-actions bot added service/lambda Issues and PRs that pertain to the lambda service. service/vpc Issues and PRs that pertain to the vpc service. labels May 22, 2023
@sousmangoosta
Copy link
Contributor

I ran the ACC for this resource/use_case and got same result :

$ AWS_DEFAULT_REGION=eu-west-1 AWS_PROFILE=default make testacc TESTS='TestAccLambdaFunction_VPC_replaceSGWithDefault' PKG=lambda
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./internal/service/lambda/... -v -count 1 -parallel 20 -run='TestAccLambdaFunction_VPC_replaceSGWithDefault'  -timeout 180m
=== RUN   TestAccLambdaFunction_VPC_replaceSGWithDefault
=== PAUSE TestAccLambdaFunction_VPC_replaceSGWithDefault
=== CONT  TestAccLambdaFunction_VPC_replaceSGWithDefault
    testing_new.go:88: Error running post-test destroy, there may be dangling resources: exit status 1
        
        Error: modifying Lambda Function (tf-acc-test-7978341270248250296) network interfaces: OperationNotPermitted: The security group can not be modified for this type of interface
                status code: 400, request id: 97376cf7-6890-4fb1-bea8-25890e19aba0
        
--- FAIL: TestAccLambdaFunction_VPC_replaceSGWithDefault (230.11s)
FAIL
FAIL    github.com/hashicorp/terraform-provider-aws/internal/service/lambda     230.322s
FAIL
make: *** [testacc] Error 1
$

@ascopes ascopes changed the title aws_lambda_function.replace_security_groups_on_destroy behaviour is no longer supported [Bug] aws_lambda_function.replace_security_groups_on_destroy behaviour is no longer supported May 22, 2023
@justinretzolk justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels May 22, 2023
@jar-b
Copy link
Member

jar-b commented May 22, 2023

Relates #29289

Thanks for the report @ascopes. Based on the included support case response, it may be worth investigating whether replacing security groups on the lambda directly (versus the orphaned ENI's) provides a similar reduction in destroy times.

@ascopes
Copy link
Author

ascopes commented May 22, 2023

@jar-b given they appear to be redesigning how this is implemented underneath, it may be worth trying to get internal confirmation from AWS that there are no immediate plans to remove that feature as well, just to be safe.

But yeah, sounds reasonable to me.

@PhilHalf
Copy link

PhilHalf commented May 26, 2023

We have been using a script very similar to the one metioned in #10329 (comment)
This stopped working for us a few days ago as per this bug.

As per @jar-b's suggestion, we've now changed the script to update the security group on the Lambda rather than the ENI, executing it in a destroy local-exec provisioner from a null_resource and the security group seems to take between 3 and 5 minutes to destroy now.
Not quite as good as what we were seeing when changing the security group directly on the ENI, but definitely better than a 45 minute wait while AWS clears it all up.

The script we're using is:

#!/bin/bash
LAMBDA_FUNCTION_NAME=${1}
VPC_ID=${2}

default_sg=$(aws ec2 describe-security-groups --filters Name=description,Values='default VPC security group' Name=vpc-id,Values=${VPC_ID} --query 'SecurityGroups[0].GroupId')
default_sg=$(echo $default_sg | jq -r '.')
aws lambda update-function-configuration --function-name ${LAMBDA_FUNCTION_NAME} --vpc-config SecurityGroupIds=${default_sg}

Hopefully this can help somebody else seeing this problem (at least until AWS stop this from working too...)

@ascopes ascopes changed the title [Bug] aws_lambda_function.replace_security_groups_on_destroy behaviour is no longer supported [Bug]: aws_lambda_function.replace_security_groups_on_destroy behaviour is no longer supported Jun 7, 2023
@jar-b jar-b self-assigned this Jun 12, 2023
@jar-b
Copy link
Member

jar-b commented Jun 12, 2023

AWS has confirmed mutation of security groups on lambda ENI's is no longer permitted, and the change will not be rolled back. At this time they have recommended removing the logic and deprecating these attributes, which we'll be doing in the next minor release.

@github-actions
Copy link

This functionality has been released in v5.3.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/lambda Issues and PRs that pertain to the lambda service. service/vpc Issues and PRs that pertain to the vpc service.
Projects
None yet
5 participants