-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
skip guest accelerators if count is 0. #866
Conversation
Some more flavor - we're deploying workers pools to join a kubernetes cluster. Each pool is deployed with a
The regular worker pool and GPU worker pool are provisioned using the same module. The Reading various sources, in some cases the That being said, this is not a resource itself, but a block within a resource. I don't see A sample error message from the cloud console when an instance in a zone with
|
Hi Jacob, Your use case is valid and your solution is sensible. Do you mind adding a test for the |
After updating the guestAccelerator test in a manner similar to the instance template test, I see the following errors, with redactions. The error is produced in this block of code. This might be one of those terraform-isms I suspected I might be violating, where the
|
google/resource_compute_instance.go
Outdated
@@ -1198,6 +1198,9 @@ func expandInstanceGuestAccelerators(d TerraformResourceData, config *Config) ([ | |||
guestAccelerators := make([]*computeBeta.AcceleratorConfig, len(accels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that you create empty entries here. Even if you use continue
below, the empty entries are still added to the list.
Instead, change this line for: guestAccelerators := make([]*computeBeta.AcceleratorConfig, 0, len(accels))
And change the line below starting with guestAccelerators[i] = ...
for guestAccelerators = append(guestAccelerators, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just attempted this and the test still fails. My theory now is that the resourceComputeInstanceRead at the end of resourceComputeInstanceCreate is what is persisted to terraform's state.
When the Plan is refreshed it sees {guest_accelerators: []}
but the current context is requesting {guest_accelerators: [{count:0, type: "nvidia-tesla-p80"}]}
.
The right way to do this might be to drop/modify the guest_accelerator
from the schema.ResourceData
instance as it's being read or immediately afterwards when the count is 0. I'll have to poke if there's an appropriate lifecycle hook (e.g. afterSchemaResourceDataRead) where this could be implemented.
I'm still puzzled why a similar behavior wasn't observed with the instance template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current state depends whether the -refresh
flag is true or false.
When you see a diff like:
guest_accelerator.#: "0" => "1" (forces new resource)
guest_accelerator.0.count: "" => "0" (forces new resource)
guest_accelerator.0.type: "" => "nvidia-tesla-k80" (forces new resource)
The left hand side is the current state. By default, when you run terraform plan
or terraform apply
, the flag -refresh=true
. This means it calls the Read function to refresh the current state. If you set -refresh=false
, then, the current state will be equal to whatever is stored in your state file.
The right hand side (after =>) is always equal to what you have in your Terraform config file (.tf file).
In your case, the config has one guest_accelerator
entry with count = 0
and type = nvidia-tesla-k80
. However, the current state is empty causing a diff.
You can use the new customdiff feature to suppress the diff in that case. I added this new helper to our codebase yesterday and the PR hasn't been merged yet: #945.
Let me know if you need help with customdiff or if you want me to takeover from here.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rosbo. Taking a stab with CustomizeDiffFunc.
264cfbd
to
9080ac0
Compare
Took a stab at it with 9080ac0. There's an error I'm currently swallowing in that commit:
|
@rosbo I amended the previous commit by adding |
google/resource_compute_instance.go
Outdated
@@ -551,6 +553,9 @@ func resourceComputeInstance() *schema.Resource { | |||
Deprecated: "Use timeouts block instead.", | |||
}, | |||
}, | |||
CustomizeDiff: customdiff.All( | |||
suppressEmptyGuestAcceleratorDiff, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use https://godoc.org/github.com/hashicorp/terraform/helper/customdiff#IfValueChange here so we can chain other customize diff in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer to merging. One small suggestion and please rebase the branch and we should be good to go.
Thanks for your great work!
Instances in instance groups in google will fail to provision, despite requesting 0 GPUs. This came up for me when trying to provision a similar instance group in all available regions, but only asking for GPU's in those that support them by parameterizing the `count` and setting it to 0. This might be a violation of some terraform principles. For example, testing locally with this change `terraform` did not recognize that indeed my infra needed to be re-deployed (from it's pov, I assume it believes this because inputs hadn't changed). Additionally, there may be valid reasons for creating an instance template with 0 gpu's that can be tuned upwards.
So I wrapped the Wanted to point out that it's wrapped in |
Alll tests are passing on the CI server. Merging this change. Thank you for your contribution @jacobstr |
Signed-off-by: Modular Magician <[email protected]>
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks! |
Instances in instance groups in google will fail to provision, despite
requesting 0 GPUs. This came up for me when trying to provision
a similar instance group in all available regions, but only asking for
GPU's in those that support them by parameterizing the
count
andsetting it to 0.
This might be a violation of some terraform principles. For example,
testing locally with this change
terraform
did not recognize thatindeed my infra needed to be re-deployed. Additionally, there may be
valid reasons for creating an instance template with 0 gpu's that can be
tuned upwards.
I'm putting this out there as an RFC to (hopefully) demonstrate what I
mean but I have not yet run the acceptance tests locally.