Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to change min and max nodes #97

Open
jpswinski opened this issue Aug 7, 2023 · 1 comment
Open

Failed to change min and max nodes #97

jpswinski opened this issue Aug 7, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@jpswinski
Copy link
Member

I tried to "Configure" the min and max nodes for the public sliderule cluster. The running configuration was 7-7-30, and I change the min to 40 and the max to 100. This put it into a state where the new configuration was 40-7-100. An update was automatically generated by ProvSys to change it to 40-40-100. When that happened, the deploy failed:

**************** cmd submitted: ['terraform', '-chdir=/ps_server/sliderule/terraform', 'apply', '-auto-approve', '-var', 'cluster_version=v3', '-var', 'domain=slideruleearth.io', '-var', 'is_public=True', '-var', 'cluster_name=sliderule', '-var', 'node_asg_min_capacity=40', '-var', 'node_asg_max_capacity=100', '-var', 'node_asg_desired_capacity=40'] at 2023-08-07 12:27:17 UTC
data.aws_ami.sliderule_cluster_ami: Reading...
data.aws_route53_zone.selected: Reading...
aws_s3_bucket_object.cron-job["cronjob.txt"]: Refreshing state... [id=infrastructure/software/sliderule-cronjob.txt]
aws_s3_bucket_object.docker-compose-config["docker-compose-ilb.yml"]: Refreshing state... [id=infrastructure/software/sliderule-docker-compose-ilb.yml]
aws_s3_bucket_object.export-log-script["export_logs.sh"]: Refreshing state... [id=infrastructure/software/sliderule-export_logs.sh]
aws_iam_role.s3-role: Refreshing state... [id=sliderule-iam-role]
aws_vpc.sliderule-vpc: Refreshing state... [id=vpc-09ca2859195464a0e]
aws_iam_policy.s3-policy: Refreshing state... [id=arn:aws:iam::742127912612:policy/sliderule-iams3-policy]
aws_s3_bucket_object.docker-compose-config["docker-compose-sliderule.yml"]: Refreshing state... [id=infrastructure/software/sliderule-docker-compose-sliderule.yml]
aws_s3_bucket_object.docker-compose-config["docker-compose-monitor.yml"]: Refreshing state... [id=infrastructure/software/sliderule-docker-compose-monitor.yml]
data.aws_secretsmanager_secret_version.secrets: Reading...
data.aws_ami.sliderule_cluster_ami: Read complete after 0s [id=ami-0098740cce22bf29d]
aws_iam_policy.ec2-policy: Refreshing state... [id=arn:aws:iam::742127912612:policy/sliderule-iamec2-policy]
data.aws_secretsmanager_secret_version.secrets: Read complete after 0s [id=slideruleearth.io/secrets|AWSCURRENT]
aws_iam_role_policy_attachment.s3-role-policy-local: Refreshing state... [id=sliderule-iam-role-20230726141827905800000001]
aws_iam_role_policy_attachment.ec2-role-policy-local: Refreshing state... [id=sliderule-iam-role-20230726141828021000000004]
aws_iam_role_policy_attachment.ec2-role-policy-aec2crro: Refreshing state... [id=sliderule-iam-role-20230726141828120900000006]
aws_iam_role_policy_attachment.ec2-role-policy-cwaap: Refreshing state... [id=sliderule-iam-role-20230726141828021700000005]
aws_iam_role_policy_attachment.ec2-role-policy-cwasp: Refreshing state... [id=sliderule-iam-role-20230726141827906200000002]
aws_iam_role_policy_attachment.ec2-role-policy-assmmic: Refreshing state... [id=sliderule-iam-role-20230726141827908600000003]
aws_iam_instance_profile.s3-role: Refreshing state... [id=sliderule-iam-profile]
aws_security_group.monitor-sg: Refreshing state... [id=sg-00ac8f5dbfa460be8]
aws_subnet.sliderule-subnet: Refreshing state... [id=subnet-01ef68d0c96b01bca]
aws_internet_gateway.sliderule-gateway: Refreshing state... [id=igw-0df6227bb09a18842]
aws_security_group.sliderule-sg: Refreshing state... [id=sg-07fbd3d2eeddcb97c]
aws_security_group.ilb-sg: Refreshing state... [id=sg-0d13068a5391bfd49]
aws_route_table.sliderule-route: Refreshing state... [id=rtb-0dc0a92d70b3da62d]
aws_route_table_association.sliderule-route-association: Refreshing state... [id=rtbassoc-0d00fe60b8f3570b9]
aws_instance.ilb: Refreshing state... [id=i-03329bd36c19bbbe2]
aws_instance.monitor: Refreshing state... [id=i-08eba42107b54c36b]
aws_launch_configuration.sliderule-instance: Refreshing state... [id=terraform-20230726141840086100000007]
aws_autoscaling_group.sliderule-cluster: Refreshing state... [id=terraform-20230726141849028500000008]
data.aws_route53_zone.selected: Read complete after 1s [id=Z0526045IQLILBFI9THF]
aws_route53_record.org: Refreshing state... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # aws_autoscaling_group.sliderule-cluster will be updated in-place
  ~ resource "aws_autoscaling_group" "sliderule-cluster" {
      ~ desired_capacity          = 7 -> 40
        id                        = "terraform-20230726141849028500000008"
      ~ launch_configuration      = "terraform-20230726141840086100000007" -> (known after apply)
      ~ max_size                  = 30 -> 100
      ~ min_size                  = 7 -> 40
        name                      = "terraform-20230726141849028500000008"
        # (21 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # aws_instance.ilb must be replaced
-/+ resource "aws_instance" "ilb" {
      ~ ami                                  = "ami-0bc69fc2ee174d0a9" -> "ami-0098740cce22bf29d" # forces replacement
      ~ arn                                  = "arn:aws:ec2:us-west-2:742127912612:instance/i-03329bd36c19bbbe2" -> (known after apply)
      ~ cpu_core_count                       = 2 -> (known after apply)
      ~ cpu_threads_per_core                 = 1 -> (known after apply)
      ~ disable_api_stop                     = false -> (known after apply)
      ~ disable_api_termination              = false -> (known after apply)
      - hibernation                          = false -> null
      ~ host_id                              = "" -> (known after apply)
      + host_resource_group_arn              = (known after apply)
      ~ id                                   = "i-03329bd36c19bbbe2" -> (known after apply)
      ~ instance_initiated_shutdown_behavior = "stop" -> (known after apply)
      ~ instance_state                       = "running" -> (known after apply)
      ~ ipv6_address_count                   = 0 -> (known after apply)
      ~ ipv6_addresses                       = [] -> (known after apply)
      ~ outpost_arn                          = "" -> (known after apply)
      ~ password_data                        = "" -> (known after apply)
      ~ placement_group                      = "" -> (known after apply)
      ~ placement_partition_number           = 0 -> (known after apply)
      ~ primary_network_interface_id         = "eni-07385c240fd14a226" -> (known after apply)
      ~ private_dns                          = "ip-10-0-1-5.us-west-2.compute.internal" -> (known after apply)
      ~ public_dns                           = "ec2-52-41-92-196.us-west-2.compute.amazonaws.com" -> (known after apply)
      ~ public_ip                            = "52.41.92.196" -> (known after apply)
      ~ secondary_private_ips                = [] -> (known after apply)
      ~ security_groups                      = [] -> (known after apply)
        tags                                 = {
            "Name" = "sliderule-ilb"
        }
      ~ tenancy                              = "default" -> (known after apply)
      + user_data_base64                     = (known after apply)
        # (15 unchanged attributes hidden)

      ~ capacity_reservation_specification {
          ~ capacity_reservation_preference = "open" -> (known after apply)

          + capacity_reservation_target {
              + capacity_reservation_id                 = (known after apply)
              + capacity_reservation_resource_group_arn = (known after apply)
            }
        }

      + ebs_block_device {
          + delete_on_termination = (known after apply)
          + device_name           = (known after apply)
          + encrypted             = (known after apply)
          + iops                  = (known after apply)
          + kms_key_id            = (known after apply)
          + snapshot_id           = (known after apply)
          + tags                  = (known after apply)
          + throughput            = (known after apply)
          + volume_id             = (known after apply)
          + volume_size           = (known after apply)
          + volume_type           = (known after apply)
        }

      ~ enclave_options {
          ~ enabled = false -> (known after apply)
        }

      + ephemeral_block_device {
          + device_name  = (known after apply)
          + no_device    = (known after apply)
          + virtual_name = (known after apply)
        }

      ~ maintenance_options {
          ~ auto_recovery = "default" -> (known after apply)
        }

      ~ metadata_options {
          ~ http_endpoint               = "enabled" -> (known after apply)
          ~ http_put_response_hop_limit = 1 -> (known after apply)
          ~ http_tokens                 = "optional" -> (known after apply)
          ~ instance_metadata_tags      = "disabled" -> (known after apply)
        }

      + network_interface {
          + delete_on_termination = (known after apply)
          + device_index          = (known after apply)
          + network_card_index    = (known after apply)
          + network_interface_id  = (known after apply)
        }

      ~ private_dns_name_options {
          ~ enable_resource_name_dns_a_record    = false -> (known after apply)
          ~ enable_resource_name_dns_aaaa_record = false -> (known after apply)
          ~ hostname_type                        = "ip-name" -> (known after apply)
        }

      ~ root_block_device {
          ~ device_name           = "/dev/sda1" -> (known after apply)
          ~ encrypted             = false -> (known after apply)
          ~ iops                  = 120 -> (known after apply)
          + kms_key_id            = (known after apply)
          - tags                  = {} -> null
          ~ throughput            = 0 -> (known after apply)
          ~ volume_id             = "vol-0eac524fb9834a7d8" -> (known after apply)
            # (3 unchanged attributes hidden)
        }
    }

  # aws_instance.monitor must be replaced
-/+ resource "aws_instance" "monitor" {
      ~ ami                                  = "ami-0bc69fc2ee174d0a9" -> "ami-0098740cce22bf29d" # forces replacement
      ~ arn                                  = "arn:aws:ec2:us-west-2:742127912612:instance/i-08eba42107b54c36b" -> (known after apply)
      ~ cpu_core_count                       = 2 -> (known after apply)
      ~ cpu_threads_per_core                 = 1 -> (known after apply)
      ~ disable_api_stop                     = false -> (known after apply)
      ~ disable_api_termination              = false -> (known after apply)
      - hibernation                          = false -> null
      ~ host_id                              = "" -> (known after apply)
      + host_resource_group_arn              = (known after apply)
      ~ id                                   = "i-08eba42107b54c36b" -> (known after apply)
      ~ instance_initiated_shutdown_behavior = "stop" -> (known after apply)
      ~ instance_state                       = "running" -> (known after apply)
      ~ ipv6_address_count                   = 0 -> (known after apply)
      ~ ipv6_addresses                       = [] -> (known after apply)
      ~ outpost_arn                          = "" -> (known after apply)
      ~ password_data                        = "" -> (known after apply)
      ~ placement_group                      = "" -> (known after apply)
      ~ placement_partition_number           = 0 -> (known after apply)
      ~ primary_network_interface_id         = "eni-04231e20eaf77763b" -> (known after apply)
      ~ private_dns                          = "ip-10-0-1-4.us-west-2.compute.internal" -> (known after apply)
      ~ public_dns                           = "ec2-34-217-85-62.us-west-2.compute.amazonaws.com" -> (known after apply)
      ~ public_ip                            = "34.217.85.62" -> (known after apply)
      ~ secondary_private_ips                = [] -> (known after apply)
      ~ security_groups                      = [] -> (known after apply)
        tags                                 = {
            "Name" = "sliderule-monitor"
        }
      ~ tenancy                              = "default" -> (known after apply)
      + user_data_base64                     = (known after apply)
        # (15 unchanged attributes hidden)

      ~ capacity_reservation_specification {
          ~ capacity_reservation_preference = "open" -> (known after apply)

          + capacity_reservation_target {
              + capacity_reservation_id                 = (known after apply)
              + capacity_reservation_resource_group_arn = (known after apply)
            }
        }

      + ebs_block_device {
          + delete_on_termination = (known after apply)
          + device_name           = (known after apply)
          + encrypted             = (known after apply)
          + iops                  = (known after apply)
          + kms_key_id            = (known after apply)
          + snapshot_id           = (known after apply)
          + tags                  = (known after apply)
          + throughput            = (known after apply)
          + volume_id             = (known after apply)
          + volume_size           = (known after apply)
          + volume_type           = (known after apply)
        }

      ~ enclave_options {
          ~ enabled = false -> (known after apply)
        }

      + ephemeral_block_device {
          + device_name  = (known after apply)
          + no_device    = (known after apply)
          + virtual_name = (known after apply)
        }

      ~ maintenance_options {
          ~ auto_recovery = "default" -> (known after apply)
        }

      ~ metadata_options {
          ~ http_endpoint               = "enabled" -> (known after apply)
          ~ http_put_response_hop_limit = 1 -> (known after apply)
          ~ http_tokens                 = "optional" -> (known after apply)
          ~ instance_metadata_tags      = "disabled" -> (known after apply)
        }

      + network_interface {
          + delete_on_termination = (known after apply)
          + device_index          = (known after apply)
          + network_card_index    = (known after apply)
          + network_interface_id  = (known after apply)
        }

      ~ private_dns_name_options {
          ~ enable_resource_name_dns_a_record    = false -> (known after apply)
          ~ enable_resource_name_dns_aaaa_record = false -> (known after apply)
          ~ hostname_type                        = "ip-name" -> (known after apply)
        }

      ~ root_block_device {
          ~ device_name           = "/dev/sda1" -> (known after apply)
          ~ encrypted             = false -> (known after apply)
          ~ iops                  = 120 -> (known after apply)
          + kms_key_id            = (known after apply)
          - tags                  = {} -> null
          ~ throughput            = 0 -> (known after apply)
          ~ volume_id             = "vol-0530bdd513527cc9c" -> (known after apply)
            # (3 unchanged attributes hidden)
        }
    }

  # aws_launch_configuration.sliderule-instance must be replaced
-/+ resource "aws_launch_configuration" "sliderule-instance" {
      ~ arn                              = "arn:aws:autoscaling:us-west-2:742127912612:launchConfiguration:c85e1f4e-27f8-4e49-bc00-5eadd8d2c78f:launchConfigurationName/terraform-20230726141840086100000007" -> (known after apply)
      ~ ebs_optimized                    = false -> (known after apply)
      ~ id                               = "terraform-20230726141840086100000007" -> (known after apply)
      ~ image_id                         = "ami-0bc69fc2ee174d0a9" -> "ami-0098740cce22bf29d" # forces replacement
      ~ name                             = "terraform-20230726141840086100000007" -> (known after apply)
      ~ name_prefix                      = "terraform-" -> (known after apply)
      ~ user_data                        = "a2fbce5475342f18d5a84eb88a9b7a233dbc4f34" -> "77c7c4dd37c864912d90616e7689dd593ca50599" # forces replacement
      - vpc_classic_link_security_groups = [] -> null
        # (6 unchanged attributes hidden)

      + ebs_block_device {
          + delete_on_termination = (known after apply)
          + device_name           = (known after apply)
          + encrypted             = (known after apply)
          + iops                  = (known after apply)
          + no_device             = (known after apply)
          + snapshot_id           = (known after apply)
          + throughput            = (known after apply)
          + volume_size           = (known after apply)
          + volume_type           = (known after apply)
        }

      + metadata_options {
          + http_endpoint               = (known after apply)
          + http_put_response_hop_limit = (known after apply)
          + http_tokens                 = (known after apply)
        }

      + root_block_device {
          + delete_on_termination = (known after apply)
          + encrypted             = (known after apply)
          + iops                  = (known after apply)
          + throughput            = (known after apply)
          + volume_size           = (known after apply)
          + volume_type           = (known after apply)
        }
    }

  # aws_route53_record.org will be updated in-place
  ~ resource "aws_route53_record" "org" {
        id                               = "Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A"
        name                             = "sliderule.slideruleearth.io"
      ~ records                          = [
          - "52.41.92.196",
        ] -> (known after apply)
        # (6 unchanged attributes hidden)
    }

Plan: 3 to add, 2 to change, 3 to destroy.

Changes to Outputs:
  ~ ilb_id         = "i-03329bd36c19bbbe2" -> (known after apply)
  ~ ilb_ip_address = "52.41.92.196" -> (known after apply)
  ~ ilb_state      = "running" -> (known after apply)
  ~ monitor_id     = "i-08eba42107b54c36b" -> (known after apply)
  ~ monitor_state  = "running" -> (known after apply)
aws_launch_configuration.sliderule-instance: Destroying... [id=terraform-20230726141840086100000007]
aws_instance.ilb: Destroying... [id=i-03329bd36c19bbbe2]
aws_instance.monitor: Destroying... [id=i-08eba42107b54c36b]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 10s elapsed]
aws_instance.ilb: Still destroying... [id=i-03329bd36c19bbbe2, 10s elapsed]
aws_instance.monitor: Still destroying... [id=i-08eba42107b54c36b, 10s elapsed]
aws_instance.ilb: Still destroying... [id=i-03329bd36c19bbbe2, 20s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 20s elapsed]
aws_instance.monitor: Still destroying... [id=i-08eba42107b54c36b, 20s elapsed]
aws_instance.monitor: Destruction complete after 30s
aws_instance.monitor: Creating...
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 30s elapsed]
aws_instance.ilb: Still destroying... [id=i-03329bd36c19bbbe2, 30s elapsed]
aws_instance.monitor: Still creating... [10s elapsed]
aws_instance.ilb: Still destroying... [id=i-03329bd36c19bbbe2, 40s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 40s elapsed]
aws_instance.monitor: Creation complete after 12s [id=i-0f4108fe8e5b4c7be]
aws_instance.ilb: Destruction complete after 50s
aws_instance.ilb: Creating...
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 50s elapsed]
aws_instance.ilb: Still creating... [10s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m0s elapsed]
aws_instance.ilb: Creation complete after 12s [id=i-017c389cfb7b47955]
aws_route53_record.org: Modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m10s elapsed]
aws_route53_record.org: Still modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A, 10s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m20s elapsed]
aws_route53_record.org: Still modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A, 20s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m30s elapsed]
aws_route53_record.org: Still modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A, 30s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m40s elapsed]
aws_route53_record.org: Still modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A, 40s elapsed]
aws_route53_record.org: Modifications complete after 46s [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m50s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 2m0s elapsed]
╷
│ Warning: Argument is deprecated
│ 
│   with aws_s3_bucket_object.docker-compose-config,
│   on config-files.tf line 4, in resource "aws_s3_bucket_object" "docker-compose-config":
│    4:   bucket = "sliderule"
│ 
│ Use the aws_s3_object resource instead
│ 
│ (and 13 more similar warnings elsewhere)
╵
╷
│ Error: deleting Auto Scaling Launch Configuration (terraform-20230726141840086100000007): ResourceInUse: Cannot delete launch configuration terraform-20230726141840086100000007 because it is attached to AutoScalingGroup terraform-20230726141849028500000008
│ 	status code: 400, request id: bb80075b-e180-4ce3-ae17-9195300e721d
│ 
│ 
╵

sliderule cmd-11: Update iter:<4> caught ProvisionCmdError exception: ProvisionCmdError('ps-server returned this error: sliderule cmd-11: Update iter:<4> FAILED with error: Processing Update sliderule cluster caught this exception: PS_InternalError("FAILED! Command \'[\'terraform\', \'-chdir=/ps_server/sliderule/terraform\', \'apply\', \'-auto-approve\', \'-var\', \'cluster_version=v3\', \'-var\', \'domain=slideruleearth.io\', \'-var\', \'is_public=True\', \'-var\', \'cluster_name=sliderule\', \'-var\', \'node_asg_min_capacity=40\', \'-var\', \'node_asg_max_capacity=100\', \'-var\', \'node_asg_desired_capacity=40\']\' returned non-zero exit status 1. for Update sliderule")')
@cugarteblair
Copy link
Collaborator

It looks like terraform did not know how to handle the change in the AMI?
Also, "launch configurations" have been deprecated:https://docs.aws.amazon.com/autoscaling/ec2/userguide/launch-configurations.html
there is a migration away from "launch configurations" to "Launch templates"

@cugarteblair cugarteblair self-assigned this Sep 20, 2023
@cugarteblair cugarteblair added the bug Something isn't working label Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants