skip guest accelerators if count is 0. #866

jacobstr · 2017-12-14T23:32:03Z

Instances in instance groups in google will fail to provision, despite
requesting 0 GPUs. This came up for me when trying to provision
a similar instance group in all available regions, but only asking for
GPU's in those that support them by parameterizing the count and
setting it to 0.

This might be a violation of some terraform principles. For example,
testing locally with this change terraform did not recognize that
indeed my infra needed to be re-deployed. Additionally, there may be
valid reasons for creating an instance template with 0 gpu's that can be
tuned upwards.

I'm putting this out there as an RFC to (hopefully) demonstrate what I
mean but I have not yet run the acceptance tests locally.

jacobstr · 2017-12-15T00:29:53Z

Some more flavor - we're deploying workers pools to join a kubernetes cluster. Each pool is deployed with a worker module that creates a MIG in each compute zone for a given region. The goal was to deploy a flavor of these workers pools that support GPUs e.g.

Regular worker pool: "us-east1-b", "us-east1-c", "us-east1-d".
GPU worker pool: "us-east1-b"

The regular worker pool and GPU worker pool are provisioned using the same module. The worker module intended to take a list of gcp_zones and gpu_count to toggle GPU support.

Reading various sources, in some cases the count attribute is exploited for this kind of conditional resource creation.

That being said, this is not a resource itself, but a block within a resource. I don't see count available for other configuration blocks that I could probably contrive a similar story for.

A sample error message from the cloud console when an instance in a zone with 0 GPUs attempts to spin up:

Instance 'koobz-wrk-xxx-1jkx' creation failed: The resource 'projects/derp/zones/us-east1-d/acceleratorTypes/nvidia-tesla-p100' was not found (when acting as '[email protected]')

rosbo · 2017-12-20T22:39:24Z

Hi Jacob,

Your use case is valid and your solution is sensible. Do you mind adding a test for the google_compute_instance too?

jacobstr · 2018-01-05T18:58:51Z

After updating the guestAccelerator test in a manner similar to the instance template test, I see the following errors, with redactions. The error is produced in this block of code.

This might be one of those terraform-isms I suspected I might be violating, where the continue hack isn't good enough. It looks like it sees that:

{guest_accelerators: []} != {guest_accelerators: [{count:0, type: "nvidia-tesla-p80"}]}

--- FAIL: TestAccComputeInstance_guestAcceleratorSkip (41.35s)
	testing.go:434: Step 0 error: After applying this step, the plan was not empty:

		DIFF:

		DESTROY/CREATE: google_compute_instance.foobar
		  boot_disk.#:                            "1" => "1"
		  boot_disk.0.auto_delete:                "true" => "true"
		  boot_disk.0.device_name:                "persistent-disk-0" => "<computed>"
		  boot_disk.0.disk_encryption_key_sha256: "" => "<computed>"
		  boot_disk.0.initialize_params.#:        "1" => "1"
		  boot_disk.0.initialize_params.0.image:  "debian-8-jessie-v20160803" => "debian-8-jessie-v20160803"
		  can_ip_forward:                         "false" => "false"
		  cpu_platform:                           "Intel Haswell" => "<computed>"
		  create_timeout:                         "4" => "4"
		  guest_accelerator.#:                    "0" => "1" (forces new resource)
		  guest_accelerator.0.count:              "" => "0" (forces new resource)
		  guest_accelerator.0.type:               "" => "nvidia-tesla-k80" (forces new resource)
		  instance_id:                            "xxx" => "<computed>"
		  label_fingerprint:                      "xxx" => "<computed>"
		  machine_type:                           "n1-standard-1" => "n1-standard-1"
		  metadata_fingerprint:                   "xxx" => "<computed>"
		  name:                                   "terraform-test-zihxsacz7q" => "terraform-test-zihxsacz7q"
		  network_interface.#:                    "1" => "1"
		  network_interface.0.address:            "10.142.0.3" => "<computed>"
		  network_interface.0.name:               "nic0" => "<computed>"
		  network_interface.0.network:            "xxx" => "default"
		  network_interface.0.network_ip:         "10.142.0.3" => "<computed>"
		  network_interface.0.subnetwork_project: "xxx" => "<computed>"
		  project:                                "xxx" => "<computed>"
		  scheduling.#:                           "1" => "1"
		  scheduling.0.automatic_restart:         "true" => "true"
		  scheduling.0.on_host_maintenance:       "TERMINATE" => "TERMINATE"
		  scheduling.0.preemptible:               "false" => "false"
		  self_link:                              "xxx" => "<computed>"
		  tags_fingerprint:                       "xxx=" => "<computed>"
		  zone:                                   "us-east1-d" => "us-east1-d"

		STATE:

		google_compute_instance.foobar:
		  ID = terraform-test-zihxsacz7q
		  attached_disk.# = 0
		  boot_disk.# = 1
		  boot_disk.0.auto_delete = true
		  boot_disk.0.device_name = persistent-disk-0
		  boot_disk.0.disk_encryption_key_raw =
		  boot_disk.0.disk_encryption_key_sha256 =
		  boot_disk.0.initialize_params.# = 1
		  boot_disk.0.initialize_params.0.image = debian-8-jessie-v20160803
		  boot_disk.0.initialize_params.0.size = 0
		  boot_disk.0.initialize_params.0.type =
		  boot_disk.0.source = xxx
		  can_ip_forward = false
		  cpu_platform = Intel Haswell
		  create_timeout = 4
		  guest_accelerator.# = 0
		  instance_id = xxx
		  label_fingerprint = xxx
		  machine_type = n1-standard-1
		  metadata.% = 0
		  metadata_fingerprint = xxx
		  min_cpu_platform =
		  name = terraform-test-zihxsacz7q
		  network_interface.# = 1
		  network_interface.0.access_config.# = 0
		  network_interface.0.address = 10.142.0.3
		  network_interface.0.alias_ip_range.# = 0
		  network_interface.0.name = nic0
		  network_interface.0.network = xxx
		  network_interface.0.network_ip = 10.142.0.3
		  network_interface.0.subnetwork = xxx
		  network_interface.0.subnetwork_project = xxx
		  project = xxx
		  scheduling.# = 1
		  scheduling.0.automatic_restart = true
		  scheduling.0.on_host_maintenance = TERMINATE
		  scheduling.0.preemptible = false
		  scratch_disk.# = 0
		  self_link = xxx
		  service_account.# = 0
		  tags_fingerprint = xxx
		  zone = us-east1-d

rosbo · 2018-01-09T18:16:32Z

google/resource_compute_instance.go

@@ -1198,6 +1198,9 @@ func expandInstanceGuestAccelerators(d TerraformResourceData, config *Config) ([
 	guestAccelerators := make([]*computeBeta.AcceleratorConfig, len(accels))


The issue is that you create empty entries here. Even if you use continue below, the empty entries are still added to the list.

Instead, change this line for: guestAccelerators := make([]*computeBeta.AcceleratorConfig, 0, len(accels))

And change the line below starting with guestAccelerators[i] = ... for guestAccelerators = append(guestAccelerators, ...)

I just attempted this and the test still fails. My theory now is that the resourceComputeInstanceRead at the end of resourceComputeInstanceCreate is what is persisted to terraform's state.

When the Plan is refreshed it sees {guest_accelerators: []} but the current context is requesting {guest_accelerators: [{count:0, type: "nvidia-tesla-p80"}]}.

The right way to do this might be to drop/modify the guest_accelerator from the schema.ResourceData instance as it's being read or immediately afterwards when the count is 0. I'll have to poke if there's an appropriate lifecycle hook (e.g. afterSchemaResourceDataRead) where this could be implemented.

I'm still puzzled why a similar behavior wasn't observed with the instance template.

The current state depends whether the -refresh flag is true or false.

When you see a diff like:

guest_accelerator.#: "0" => "1" (forces new resource) guest_accelerator.0.count: "" => "0" (forces new resource) guest_accelerator.0.type: "" => "nvidia-tesla-k80" (forces new resource)

The left hand side is the current state. By default, when you run terraform plan or terraform apply, the flag -refresh=true. This means it calls the Read function to refresh the current state. If you set -refresh=false, then, the current state will be equal to whatever is stored in your state file.

The right hand side (after =>) is always equal to what you have in your Terraform config file (.tf file).

In your case, the config has one guest_accelerator entry with count = 0 and type = nvidia-tesla-k80. However, the current state is empty causing a diff.

You can use the new customdiff feature to suppress the diff in that case. I added this new helper to our codebase yesterday and the PR hasn't been merged yet: #945.

Let me know if you need help with customdiff or if you want me to takeover from here.

Thanks

Thanks @rosbo. Taking a stab with CustomizeDiffFunc.

jacobstr · 2018-01-12T00:17:58Z

Took a stab at it with 9080ac0. There's an error I'm currently swallowing in that commit:

Clear only operates on computed keys - guest_accelerator is not one

Clear seemed like the obvious function to use to ignore a diff. But indeed, the docs state the limitation reported in the error message.

jacobstr · 2018-01-19T03:56:21Z

@rosbo I amended the previous commit by adding Computed: True to the schema, which allowed the CustomizeDiff to do it's job. It's unclear to me what effect changing it to a computed field will have.

rosbo · 2018-01-19T23:24:18Z

google/resource_compute_instance.go

@@ -551,6 +553,9 @@ func resourceComputeInstance() *schema.Resource {
 				Deprecated: "Use timeouts block instead.",
 			},
 		},
+		CustomizeDiff: customdiff.All(
+			suppressEmptyGuestAcceleratorDiff,


Use https://godoc.org/github.com/hashicorp/terraform/helper/customdiff#IfValueChange here so we can chain other customize diff in the future.

rosbo

Getting closer to merging. One small suggestion and please rebase the branch and we should be good to go.

Thanks for your great work!

Instances in instance groups in google will fail to provision, despite requesting 0 GPUs. This came up for me when trying to provision a similar instance group in all available regions, but only asking for GPU's in those that support them by parameterizing the `count` and setting it to 0. This might be a violation of some terraform principles. For example, testing locally with this change `terraform` did not recognize that indeed my infra needed to be re-deployed (from it's pov, I assume it believes this because inputs hadn't changed). Additionally, there may be valid reasons for creating an instance template with 0 gpu's that can be tuned upwards.

jacobstr · 2018-01-22T22:08:39Z

So I wrapped the suppressEmptyGuestAcceleratorDiff method in a customdiff.If and apply the custom diff if there's any change to guest_accelerators. It's quite non-specific but it felt repetitive to repeat the logic in suppressEmptyGuestAcceleratorDiff.

Wanted to point out that it's wrapped in customdiff.All, and the suppressEmptyGuestAccelerator diff method only affects the portion of the diff related to the guest_accelerator key. i.e. I believe it would still compose well without the conditional check. The exception might be if there's another diff customizer for the guest_accelerator key.

rosbo · 2018-01-23T19:51:29Z

Alll tests are passing on the CI server. Merging this change. Thank you for your contribution @jacobstr

Signed-off-by: Modular Magician <[email protected]>

ghost · 2020-03-29T14:36:59Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

jacobstr force-pushed the master branch from 5a2c67a to 8fba76f Compare December 14, 2017 23:32

rosbo requested review from danawillow and removed request for danawillow December 20, 2017 19:33

rosbo self-assigned this Dec 20, 2017

rosbo self-requested a review January 5, 2018 19:34

jacobstr force-pushed the master branch from 8fba76f to bedec8b Compare January 5, 2018 20:28

rosbo suggested changes Jan 9, 2018

View reviewed changes

jacobstr force-pushed the master branch 3 times, most recently from 264cfbd to 9080ac0 Compare January 12, 2018 00:11

jacobstr force-pushed the master branch from 9080ac0 to fee7d03 Compare January 19, 2018 03:48

rosbo reviewed Jan 19, 2018

View reviewed changes

jacobstr and others added 5 commits January 22, 2018 13:29

Add guest accelerator skip test for instances.

37ea402

do not leave empty pointers to guest accelerators.

6c402ad

attempt to clear guest accelerator diff

47dffbc

conditionally customize diff for guest accels

3875034

jacobstr force-pushed the master branch from fee7d03 to 3875034 Compare January 22, 2018 21:52

rosbo approved these changes Jan 23, 2018

View reviewed changes

rosbo merged commit 939ba6d into hashicorp:master Jan 23, 2018

modular-magician added a commit to modular-magician/terraform-provider-google that referenced this pull request Sep 27, 2019

promote internal global addresses to GA (hashicorp#866)

5ac5a48

Signed-off-by: Modular Magician <[email protected]>

ghost locked and limited conversation to collaborators Mar 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip guest accelerators if count is 0. #866

skip guest accelerators if count is 0. #866

jacobstr commented Dec 14, 2017

jacobstr commented Dec 15, 2017 •

edited

Loading

rosbo commented Dec 20, 2017

jacobstr commented Jan 5, 2018 •

edited

Loading

rosbo Jan 9, 2018

jacobstr Jan 11, 2018

rosbo Jan 11, 2018 •

edited

Loading

jacobstr Jan 11, 2018

jacobstr commented Jan 12, 2018

jacobstr commented Jan 19, 2018

rosbo Jan 19, 2018

rosbo left a comment

jacobstr commented Jan 22, 2018 •

edited

Loading

rosbo commented Jan 23, 2018

ghost commented Mar 29, 2020

		@@ -1198,6 +1198,9 @@ func expandInstanceGuestAccelerators(d TerraformResourceData, config *Config) ([
		guestAccelerators := make([]*computeBeta.AcceleratorConfig, len(accels))

skip guest accelerators if count is 0. #866

skip guest accelerators if count is 0. #866

Conversation

jacobstr commented Dec 14, 2017

jacobstr commented Dec 15, 2017 • edited Loading

rosbo commented Dec 20, 2017

jacobstr commented Jan 5, 2018 • edited Loading

rosbo Jan 9, 2018

Choose a reason for hiding this comment

jacobstr Jan 11, 2018

Choose a reason for hiding this comment

rosbo Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

jacobstr Jan 11, 2018

Choose a reason for hiding this comment

jacobstr commented Jan 12, 2018

jacobstr commented Jan 19, 2018

rosbo Jan 19, 2018

Choose a reason for hiding this comment

rosbo left a comment

Choose a reason for hiding this comment

jacobstr commented Jan 22, 2018 • edited Loading

rosbo commented Jan 23, 2018

ghost commented Mar 29, 2020

jacobstr commented Dec 15, 2017 •

edited

Loading

jacobstr commented Jan 5, 2018 •

edited

Loading

rosbo Jan 11, 2018 •

edited

Loading

jacobstr commented Jan 22, 2018 •

edited

Loading