Plugin panics while destroying instance group manager #14516

nicolaferraro · 2023-05-08T08:18:50Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

v1.2.9

Affected Resource(s)

google_compute_instance_group_manager

Terraform Configuration Files

Panic happens on terraform destroy -refresh=false when the resource is not found.

variable "project_id" {
  type = string
}

variable "region" {
  type = string
}

variable "availability_zone" {
  type = string
}

provider "google" {
  project = var.project_id
  region  = var.region
}

locals {
  vm_user_data = {
    users = [
      "default",
      {
        name          = "myself",
        gecos         = "myself",
        primary_group = "myself",
        sudo          = "ALL=(ALL) NOPASSWD:ALL",
        shell         = "/bin/bash",
        groups        = "users,adm",
        lock_passwd   = false,
      }
    ],
    package_upgrade = true,
    runcmd          = [
      ["echo", "--- booted ---"],
    ]
  }
  vm_user_data_with_cloud_config_directive = "#cloud-config\n${jsonencode(local.vm_user_data)}"
}

resource "google_project_service" "compute_api" {
  service                    = "compute.googleapis.com"
  disable_dependent_services = false
  disable_on_destroy         = false
}

resource "google_service_account" "myservice_agent" {
  account_id = "myserviceaccount"
}

resource "google_compute_network" "myservice" {
  name                            = "myvpc"
  auto_create_subnetworks         = true
  delete_default_routes_on_create = false
  routing_mode                    = "GLOBAL"

  depends_on = [
    google_project_service.compute_api,
  ]
}

resource "google_compute_instance_template" "myservice_agent" {
  name_prefix  = "agent-"
  machine_type = "e2-medium"
  metadata     = {
    user-data = local.vm_user_data_with_cloud_config_directive
  }

  disk {
    disk_size_gb = 16
    disk_type    = "pd-balanced"
    auto_delete  = "true"
    boot         = "true"
    source_image = "ubuntu-os-cloud/ubuntu-2204-lts"
  }

  service_account {
    email  = google_service_account.myservice_agent.email
    scopes = ["cloud-platform"]
  }

  network_interface {
    network = google_compute_network.myservice.name
    access_config {
      network_tier = "STANDARD"
    }
  }

  scheduling {
    automatic_restart   = "true"
    on_host_maintenance = "MIGRATE"
  }

  shielded_instance_config {
    enable_secure_boot          = "true"
    enable_vtpm                 = "true"
    enable_integrity_monitoring = "true"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "google_compute_instance_group_manager" "myservice_agent" {
  name               = "mygroup"
  base_instance_name = "myinstance"
  zone               = var.availability_zone

  version {
    name              = "myversion"
    instance_template = google_compute_instance_template.myservice_agent.id
  }

  target_size        = 1
  wait_for_instances = "true"

  update_policy {
    max_surge_fixed       = 1
    max_unavailable_fixed = 1
    minimal_action        = "REPLACE"
    replacement_method    = "SUBSTITUTE"
    type                  = "PROACTIVE"
  }
}

Debug Output

https://gist.github.com/nicolaferraro/5fa20c6fe3058ea5ec37234c734a1a04

Panic Output

╷
│ Error: Plugin did not respond
│ 
│ The plugin encountered an error, and failed to respond to the plugin.(*GRPCProvider).ApplyResourceChange call. The plugin logs may contain more details.
╵

Stack trace from the terraform-provider-google_v4.63.1_x5 plugin:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x100 pc=0x2270b29]

goroutine 219 [running]:
github.com/hashicorp/terraform-provider-google/google.waitForInstancesRefreshFunc.func1()
        github.com/hashicorp/terraform-provider-google/google/resource_compute_region_instance_group_manager.go:515 +0xa9
github.com/hashicorp/terraform-plugin-sdk/v2/helper/resource.(*StateChangeConf).WaitForStateContext.func1()
        github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/resource/state.go:110 +0x207
created by github.com/hashicorp/terraform-plugin-sdk/v2/helper/resource.(*StateChangeConf).WaitForStateContext
        github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/resource/state.go:83 +0x1d8

Error: The terraform-provider-google_v4.63.1_x5 plugin crashed!

This is always indicative of a bug within the plugin. It would be immensely
helpful if you could report the crash with the plugin's maintainers so that it
can be fixed. The output above should help diagnose the issue.

2023-05-08T10:04:34.167+0200 [DEBUG] provider: plugin exited

Expected Behavior

Should not panic and not fail because the resource is already deleted.

Actual Behavior

Panic.

Steps to Reproduce

terraform apply
Delete the instance group manually from gcloud console (this may be deleted as part of a previous run, but the tf state is out of sync)
terraform destroy -refresh=false

The text was updated successfully, but these errors were encountered:

megan07 · 2023-05-08T20:58:21Z

Hi @nicolaferraro, so sorry you're running into this issue! I think I see where the problem is, and I can put in a fix for this, but for documentation purposes I want to do my own stack-trace here.

The panic comes here because m, the InstanceManager, is nil.

To get to that point, we start in Delete, and computeRIGMWaitForInstanceStatus is called when wait_for_instances is set.

This bring us into that function, where the function getRegionalManager is passed to waitForInstancesRefreshFunc.

You'll see in waitForInstancesRefreshFunc that f is called, that is the function passed in, or, getRegionalManager.

So we go there and see the call to Get the InstanceManager. This returns a 404 error because it no longer exists and thus handleNotFoundError is called. Now, when reading a resource and we see that a resource has been deleted outside of Terraform, we typically just remove it from state (d.SetId("")) and return a nil error, and since this getRegionalManager function is used both for reading and deleting, when we return back to delete, the error being returned is nil, but so is the InstanceManager.

I'll fix this so we check m == nil to account for this case, and if it is, we'll assume it doesn't exist and we can just move on.

It's not the same exact problem as this issue, but its similar in the sense that we need to be careful how we're reusing the handleNotFoundError function.

(and sorry, I realized after the fact that my stack trace above is with the regional instance manager, but it applies to the instance manager as well)

github-actions · 2023-06-09T02:19:11Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

nicolaferraro added the bug label May 8, 2023

github-actions bot added the crash label May 8, 2023

edwardmedia assigned edwardmedia and megan07 and unassigned edwardmedia May 8, 2023

megan07 mentioned this issue May 9, 2023

check if instance manager not found on delete, don't return an error GoogleCloudPlatform/magic-modules#7903

Merged

5 tasks

megan07 closed this as completed in GoogleCloudPlatform/magic-modules#7903 May 9, 2023

This was referenced May 9, 2023

check if instance manager not found on delete, don't return an error hashicorp/terraform-provider-google-beta#5614

Merged

check if instance manager not found on delete, don't return an error #14543

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin panics while destroying instance group manager #14516

Plugin panics while destroying instance group manager #14516

nicolaferraro commented May 8, 2023

megan07 commented May 8, 2023

github-actions bot commented Jun 9, 2023

Plugin panics while destroying instance group manager #14516

Plugin panics while destroying instance group manager #14516

Comments

nicolaferraro commented May 8, 2023

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

megan07 commented May 8, 2023

github-actions bot commented Jun 9, 2023