Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_apigatewayv2_deployment not deploying #12961

Closed
christhomas opened this issue Apr 23, 2020 · 21 comments · Fixed by #13055
Closed

aws_apigatewayv2_deployment not deploying #12961

christhomas opened this issue Apr 23, 2020 · 21 comments · Fixed by #13055
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/apigatewayv2 Issues and PRs that pertain to the apigatewayv2 service.
Milestone

Comments

@christhomas
Copy link

I'm trying to build an app which will deploy the new apigatewayv2 websocket app when the code updates.

I've got this working on the original aws_api_gateway_deployment by using a stage_description. I've got a really simple example of this deployment which just uses the timestamp, guaranteeing it'll always deploy, this makes it easier to describe this issue.

resource "aws_api_gateway_deployment" "trigger_deployment" {
  rest_api_id = data.aws_api_gateway_rest_api.api_gateway.id
  stage_name  = var.env

  stage_description = md5(timestamp())

  lifecycle {
    create_before_destroy = true
  }
}

and now I'm trying the same thing with the websocket api like this, I've included the stage as well so you can see more of the relevant code

resource "aws_apigatewayv2_deployment" "websocket_api" {
  depends_on = [
    aws_apigatewayv2_route.connect,
    aws_apigatewayv2_route.disconnect,
    aws_apigatewayv2_route.on_message,
  ]

  api_id      = aws_apigatewayv2_api.websocket_api.id
  description = "${var.squad} ${var.name} WEBSOCKET deployment, deployed at: ${timestamp()}"
}

resource "aws_apigatewayv2_stage" "websocket_api" {
  api_id = aws_apigatewayv2_api.websocket_api.id
  name   = var.env
  deployment_id = aws_apigatewayv2_deployment.websocket_api.id

  default_route_settings {
    data_trace_enabled = true
    detailed_metrics_enabled = true
    logging_level = "INFO"
    throttling_burst_limit = 5000
    throttling_rate_limit = 10000
  }

  tags = {
    deployed_at = timestamp()
  }
}

which according to my understanding, should automatically deploy the api every time this is run. Which for my contrived example is perfect. Cause I just want to trigger an actual deployment.

However, when I see the stage settings and look at the deployment log. I don't see any deployments happening, despite terraform saying that everything was ok.

If I click the "Deploy API" button in the console, the api deploys and everything is functional again.

I've found something odd that happens. It doesn't appear to create a new deployment object. It merely uses another deployment object from days ago, updating it. The problem is that this deployment was broken and I know the reasons why, the lambdas are badly named, so I can see in the error logs, the name of the function is wrong. But in the new deployment, they have new names and thats how I can tell the difference between an old deployment and a new deployment, because of the lambda functions being executed.

So when I trigger a new deployment, it seems to find the old deployment, update it (it's broken cause the lambda function names), so the deployment works, but the code fails.

If I manually trigger another deployment, it gets the new lambda function names, the code works and the application gives me the right result back.

So something is broken here, I've got a screenshot of my deployment history, you can see the deployment date on the left and in the description don't match. But you can see the top deployment was the one I did manually. So if I change deployment to that one it works, but if I do another terraform apply, it'll update the bottom deployment with the fields, the description will change to reflect the new timestamp, but the application is broken again

Screenshot 2020-04-23 at 13 39 47

Screenshot 2020-04-23 at 13 39 40

@ghost ghost added service/apigateway Issues and PRs that pertain to the apigateway service. service/apigatewayv2 Issues and PRs that pertain to the apigatewayv2 service. labels Apr 23, 2020
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Apr 23, 2020
@ewbankkit
Copy link
Contributor

@christhomas Thanks for raising this.

It seems like the underlying cause is that the Deploy API action in the AWS Console creates a new deployment, not an update to the existing deployment.
Changing the description attribute updates the existing deployment and currently only changing the api_id attribute will force creation of a new resource.

  depends_on = [
    aws_apigatewayv2_route.connect,
    aws_apigatewayv2_route.disconnect,
    aws_apigatewayv2_route.on_message,
  ]

means that the deployment is created after the routes (and destroyed before) but won't trigger a recreation of the deployment if one of the route changes.

We could add some sort of synthetic "deployment checksum" attribute which is completely under the user's control. Changing its value would force a new deployment to be created (deleting the old deployment resource).

Similar (for API Gateway v1 REST APIs):

@christhomas
Copy link
Author

But creating a new api_id would require creating a new api_gateway_api, and then any other subsequent resources, does this sound like a blue/green deployment pattern?

It might actually solve an outstanding problem with the existing api gateway where if you try to update an existing deployment which has a vpc link it'll fail because the vpc link deployed in cloudfront will block the api gateway from being updated (I already have an outstanding ticket about that here: #12195

The only reason those depends_on attributes are here is because I don't want to create the deployment before the routes are created, cause then if you create one without the other it fails because there are no routes or integrations.

So would you suggest to just create a brand new api gateway each time and all it's resources? Cause that does sound like a very heavy deployment to execute

@christhomas
Copy link
Author

I think the problem of deciding when and when not to create a new deployment is very tricky. Since if any of the routes, integrations, or responses change. Then a new deployment should be made, right? Cause otherwise the cloudfront distribution isn't compiled, nor deployed. So any changes I make aren't made live, despite wanting to create a new deployment.

What if the integrations don't change, but the lambda code they're bound to does change. Then I'd want a new deployment since the old deployment will be executing the wrong lambdas

I'm not sure how to proceed, given the extra knowledge that I have now.

@christhomas
Copy link
Author

Ok, so I've tried to update the api description, trigger another deployment by forcing the creation of a new api id, but I'm unable to figure out exactly what I should do.

Seems that nothing works. I can't figure out what I should do if I need to trigger a deployment. Any ideas?

@ewbankkit
Copy link
Contributor

Can you do a terraform taint on the deployment resource?

@christhomas
Copy link
Author

As a general solution to this problem, I don't think I would like that over an actual deployment that I can trigger and actually deploy using terraform. But if you mean as a stop gap solution until I get a real solution. Then I think that's ok.

Or, I just thought I could use a null_resource to forcefully create an aws deployment using the cli, but again. That's not really a solution I would like to keep.

Will this deployment checksum take a lot of time to create?

@christhomas
Copy link
Author

I think it's overall impossible that terraform knows when to, or not to, execute an actual deployment, since there are so many reasons to deploy, or avoid deployments that are unnecessary that I think it would just be easier if I can override this using some input from my side on the resource for do some calculation that I could figure out under what circumstances to deploy or not.

For example, lambda function changes it's name, then I need to make a new deployment, but the api gateway id, stage names, etc, etc won't change. So how can terraform know to deploy in those circumstances?

I think that's why I liked being able to force a deployment with a change to the description, or perhaps a tag variable could be a nice way to trigger the deployment. Then I can use locals or some other means to compute whether something important has changed and that this change will require an actual deployment and force terraform to go ahead with it.

Otherwise, if it's some internally computed variable which I can't give any input into, I think it'll always be buggy cause terraform can't possibly know every scenario. But I perhaps can help to narrow that down, or accept more deployments than necessary, as long as they are predicable.

Do you know what I mean?

@christhomas
Copy link
Author

This isn't a great solution. I'd still like to keep this issue open until it's resolved inside terraform itself though. You're going to implement that checksum approach right?

But in the meantime, if anybody is reading this and wants a solution. This is what I went with at the end. I've verified that it does actually work.

resource "null_resource" "force_deployment" {
  triggers = {
    timestamp = timestamp()
    aws_region = var.aws_region
    api-id = aws_apigatewayv2_api.websocket_api.id
    stage = aws_apigatewayv2_stage.websocket_api.name
    domain-name = aws_route53_record.sub_domain.name
  }

  provisioner "local-exec" {
    when = create
    command = <<EOT
      aws apigatewayv2 \
          --region ${self.triggers.aws_region} \
          create-deployment \
          --api-id ${self.triggers.api-id} \
          --stage-name ${self.triggers.stage} \
          --description "Forced Deployment at: ${self.triggers.timestamp}" || true
    EOT
  }
}

@ebarault
Copy link

ebarault commented Apr 27, 2020

thank you @christhomas for this workaround

do you know if deploying the api has collateral effects from the currently deployed api perspective? any downtime? reset of currently opened connections?

Otherwise, why not forcing a new deployment each time in the aws_apigatewayv2_deployment module ?

@christhomas
Copy link
Author

I've been working with this workaround for a week and each deployment doesn't seem to trigger any problems, but maybe the occasional 500 internal service error, but it's not consistently giving those problems. Only occasionally.

It probably will cause more downtime on services which are under heavy use, cause I don't think api-gateway has any mechanism to manage that.

I've been wondering how to do a blue/green deployment on a gateway with dozens of endpoints from various projects, it's been something I'm unable to find a solution to. But I'm open to suggestions.

The reason to force a new deployment is that the aws_apigatewayv2_deployment module doesn't yet have the capability to trigger new deployments based on some changing value that I control. As you can see in the above comments, there is no computed checksum like in the first aws_api_gateway_deployment module.

So what was happening was that terrraform was saying that it's deployed and everything is ok, except it didn't deploy and it was merely finding an existing (in my case broken) deployment with the same deployment id, updating it with the new description I was using to try to trigger a new deployment and pushing out a broken api. Despite having a new deployment ready.

I don't wanna repeat myself from the bug report above.

It seems overkill to constantly force redeployments. But it's complicated to know whether or not you should deploy again, cause things like lambda functions might change, then you need to redeploy. Or permission changes. Then deploy again? or not?

I'm not sure how to solve that.

bflad added a commit that referenced this issue Apr 28, 2020
Reference: #12961
Reference: #13054 (aws_api_gateway_deployment resource)
Reference: https://www.terraform.io/docs/providers/null/resource.html#triggers
Reference: https://www.terraform.io/docs/providers/random/#resource-quot-keepers-quot-
Reference: https://www.terraform.io/docs/providers/time/#resource-quot-triggers-quot-

Since it does not appear there will be functionality added anytime soon in the Terraform core to support resource configuration that automatically triggers resource recreation when referenced resources are updated, this introduces a `triggers` map argument similar to those utilized by the `null`, `random`, and `time` providers. This can be used by operators to automatically force a new resource (redeployment) using key/value criteria of their choosing. Its usage is fairly advanced, so caveats are added to the documentation.

We do not intend to add this class of argument to all Terraform AWS Provider resources due to its complexity and potentially awkward configuration, however, this is a pragmatic compromise for this particular resource which does not fit well into Terraform's usual provisioning model.

Output from acceptance testing:

```
--- PASS: TestAccAWSAPIGatewayV2Deployment_disappears (23.05s)
--- PASS: TestAccAWSAPIGatewayV2Deployment_basic (41.80s)
--- PASS: TestAccAWSAPIGatewayV2Deployment_Triggers (70.95s)
```
@ebarault
Copy link

as @bflad suggested seems fine to me. Thank you !

@christhomas
Copy link
Author

Ah perfect! so I can build a triggers statement which reflects what parts of my app should trigger a new deployment and if this statement changes, it'll deploy again.

I'll close this issue myself when this is released. Just to keep this information in the issues for people to find during the meantime.

That's absolutely perfect, thanks!

@bflad
Copy link
Contributor

bflad commented Apr 28, 2020

@christhomas / @ebarault great to hear this might be what you're looking for! I was hesitant to mark the V2 triggers argument as closing this issue without chatting here first. 😄

@bflad bflad added enhancement Requests to existing resources that expand the functionality or scope. and removed needs-triage Waiting for first response or review from a maintainer. service/apigateway Issues and PRs that pertain to the apigateway service. labels Apr 28, 2020
@christhomas
Copy link
Author

Let me get the release, try it out, and then I'll close this afterwards, if everything goes ok :) Thanks!

@christhomas
Copy link
Author

I forgot to ask, do you have any idea what the release date of this feature is? I'm doing a lot of API Gateway stuff right now, so thats why I'm being very proactive in getting things reported and fixed :) Hope I'm being useful

@bflad
Copy link
Contributor

bflad commented Apr 28, 2020

I do not have an estimated time for review and merge, but you can custom build the provider in the meantime to try it out.

@bflad bflad added this to the v2.61.0 milestone May 1, 2020
bflad added a commit that referenced this issue May 1, 2020
Reference: #12961
Reference: #13054 (aws_api_gateway_deployment resource)
Reference: https://www.terraform.io/docs/providers/null/resource.html#triggers
Reference: https://www.terraform.io/docs/providers/random/#resource-quot-keepers-quot-
Reference: https://www.terraform.io/docs/providers/time/#resource-quot-triggers-quot-

Since it does not appear there will be functionality added anytime soon in the Terraform core to support resource configuration that automatically triggers resource recreation when referenced resources are updated, this introduces a `triggers` map argument similar to those utilized by the `null`, `random`, and `time` providers. This can be used by operators to automatically force a new resource (redeployment) using key/value criteria of their choosing. Its usage is fairly advanced, so caveats are added to the documentation.

We do not intend to add this class of argument to all Terraform AWS Provider resources due to its complexity and potentially awkward configuration, however, this is a pragmatic compromise for this particular resource which does not fit well into Terraform's usual provisioning model.

Output from acceptance testing:

```
--- PASS: TestAccAWSAPIGatewayV2Deployment_disappears (23.05s)
--- PASS: TestAccAWSAPIGatewayV2Deployment_basic (41.80s)
--- PASS: TestAccAWSAPIGatewayV2Deployment_Triggers (70.95s)
```
@bflad
Copy link
Contributor

bflad commented May 1, 2020

Hi folks 👋

Since it does not appear there will be functionality added anytime soon in Terraform core to support a form of resource configuration that will automatically triggers resource recreation when referenced resources are updated, the aws_apigatewayv2_deployment resource has been enhanced with a triggers map argument similar to those utilized by the null, random, and time providers. This can be used by operators to automatically force a new resource (redeployment) using key/value criteria of their choosing. Its usage is fairly advanced, so caveats are added to the documentation. This functionality will release with version 2.61.0 of the Terraform AWS Provider, later next week.

If this type of enhancement does not fit your needs, we would encourage you to file a new issue (potentially upstream in Terraform core since there's not much else we can do at the provider level). Please also note that we do not intend to add this class of argument to all Terraform AWS Provider resources due to its complexity and potentially awkward configuration.

@christhomas
Copy link
Author

Wow that was fast, it was only the other day you said you didn't know when it's going to be merged :D Thanks a lot!

@ghost
Copy link

ghost commented May 8, 2020

This has been released in version 2.61.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@christhomas
Copy link
Author

Sorry for the delay in feedback. It works great, thanks!

@ghost
Copy link

ghost commented Jun 1, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Jun 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/apigatewayv2 Issues and PRs that pertain to the apigatewayv2 service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants