Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.15.4 replaces resources on "Optional+Computed" attributes #29028

Closed
yanndegat opened this issue Jun 24, 2021 · 12 comments
Closed

v0.15.4 replaces resources on "Optional+Computed" attributes #29028

yanndegat opened this issue Jun 24, 2021 · 12 comments
Labels
bug new new issue not yet triaged

Comments

@yanndegat
Copy link
Contributor

yanndegat commented Jun 24, 2021

Terraform Version

Terraform v0.15.4
on linux_amd64
+ provider registry.terraform.io/ovh/ovh v0.13.0

Terraform Configuration Files

resource "ovh_cloud_project_kube" "cluster" {
  name         = "test"
  region       = "GRA7"
}

output "kube_version" {
  value = ovh_cloud_project_kube.cluster.version
}

Debug Output

Expected Behavior

The First apply should have output the "version" attribute in the outputs.
The Second apply shouldn't detect any change

Actual Behavior

First apply:

                                       
...
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.  
$ OVH_CLOUD_PROJECT_SERVICE=...  terraform0.15.4 output
╷                                                                    
│ Warning: No outputs found                                                      
│                                                                                  
│ The state file either has no outputs defined, or all the defined outputs are empty. Please define an output in your configuration with the `output` keyword and run `terraform refresh` for it to become available. If you are using
│ interpolation, please verify the interpolated value is not empty. You can use the `terraform console` command to assist.
╵                               

Second plan:

terraform0.15.4 plan
ovh_cloud_project_kube.cluster: Refreshing state... [id=6ac33f97-4745-4245-815a-8d10306a2e58]
                                                  
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement                                                              
                                                                                             
Terraform will perform the following actions:               
                                                                                                                                                                                                                                            
  # ovh_cloud_project_kube.cluster must be replaced                              
-/+ resource "ovh_cloud_project_kube" "cluster" {
      ~ control_plane_is_up_to_date = true -> (known after apply)
      ~ id                          = "..." -> (known after apply)                 
      ~ is_up_to_date               = true -> (known after apply)                        
      ~ kubeconfig                  = (sensitive value)
        name                        = "test"
      ~ next_upgrade_versions       = [] -> (known after apply)
      ~ nodes_url                   = "....gra7.k8s.ovh.net" -> (known after apply)
      ~ status                      = "READY" -> (known after apply)
      ~ update_policy               = "ALWAYS_UPDATE" -> (known after apply)
      ~ url                         = "....gra7.k8s.ovh.net" -> (known after apply)
      - version                     = "1.20" -> null # forces replacement
        # (2 unchanged attributes hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Steps to Reproduce

  1. terraform init
  2. terraform apply
  3. terraform plan

Additional Context

This behavior is introduced by terraform 0.15.4. The same plan has been tested with terraform 0.15.3 and works as expected.
This has been reproduced up to terraform 1.0.0 version.

The version attribute is defined as :
https://github.com/ovh/terraform-provider-ovh/blob/master/ovh/resource_cloud_project_kube.go#L38

			"version": {
				Type:     schema.TypeString,
				Optional: true,
				Computed: true,
				ForceNew: true,
			},

References

ovh/terraform-provider-ovh#204

@jbardin
Copy link
Member

jbardin commented Jun 24, 2021

Hi @yanndegat,

The terraform changes you've mentioned in the linked issue are not related to any plan changes, and only serve to provide more information for the user to see how resource change outside of terraform. They cause no difference in the plan and apply operations of terraform.

The behavior here would be explained by the resource returning null for version on the first apply, and changing that value on a later apply. Terraform does not make the decision to replace a resource, it only creates a proposed new state, and the provider is what declares if a change will cause replacement or not.

If the resource is ensuring that a computed value is set and remains unchanged, then you may have an issue with the plugin SDK. Since you are not setting an initial value (which appears to be the case as the output is null), this could be a bug caused by the legacy SDK's inability to differentiate between zero values an unset values, which may be changing the null to the empty string "" and triggering the replacement. I think the workaround there would be to always ensure that the field has a known value.

@yanndegat
Copy link
Contributor Author

yanndegat commented Jun 24, 2021

hi @jbardin

I'm not sure i get your point as it's quite complex to express.
For a little bit of context:

so something must have been introduced into the core that changes that behavior?
but apart from that guess, i cant see anything in the v0.15.4 release note which could trigger that change.

do you see something wrong on the ovh provider's side ? which could have worked up to 0.15.3 but not after ?

thanks a lot

@yanndegat
Copy link
Contributor Author

btw:

The behavior here would be explained by the resource returning null for version on the first apply, and changing that value on a later apply. Terraform does not make the decision to replace a resource, it only creates a proposed new state, and the provider is what declares if a change will cause replacement or not.

on terraform v0.15.3, the resource doesn"t return null on the first apply. it only does since terraform v0.15.4.

this is what annoys me.
the first apply has an output value in terraform v0.15.3 but there's no output in terraform 0.15.4

@jbardin
Copy link
Member

jbardin commented Jun 24, 2021

Thanks @yanndegat, I'll look through that set of commits and see if there is anything which looks related. Unusual behavior like this is often accompanied by warnings in the core logs. Can you see if there is any info there about the values returned by this resource?

@yanndegat
Copy link
Contributor Author

Ok, so my bad: it doesn't work in terraform v0.15.3 either. but in terraform v0.13.6

  1. when i run against terraform v0.15.4, i have these warnings, no output, and it tries to recreate the resource in the next run.
 TF_LOG=WARN terraform0.15.4 apply
...
ovh_cloud_project_kube.cluster: Still creating... [2m40s elapsed]
2021-06-24T14:40:32.934Z [WARN]  Provider "provider[\"registry.terraform.io/ovh/ovh\"]" produced an unexpected new value for ovh_cloud_project_kube.cluster, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:                   
      - .version: was null, but now cty.StringVal("1.20")                                                                                                    
      - .private_network_id: was null, but now cty.StringVal("")                                                                                                                                          
ovh_cloud_project_kube.cluster: Creation complete after 2m42s [id=80283eaa-3d6e-42e8-9616-cdfe1db01bc5]                                                                                          
                                                                                                                                                                                                                                      
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.              
  1. when i run against terraform v0.15.3, i have these warnings, no output, and it tries to recreate the resource in the next run.
 TF_LOG=WARN terraform0.15.3 apply
...
ovh_cloud_project_kube.cluster: Still creating... [2m40s elapsed]
2021-06-24T14:45:00.200Z [WARN]  Provider "provider[\"registry.terraform.io/ovh/ovh\"]" produced an unexpected new value for ovh_cloud_project_kube.cluster, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .private_network_id: was null, but now cty.StringVal("")
      - .version: was null, but now cty.StringVal("1.20")
ovh_cloud_project_kube.cluster: Creation complete after 2m40s [id=d9bdf6c3-92c0-4b5f-9481-bc433da18b69]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
  1. when i run against terraform v0.13.6, i have these warnings, but it works.
 TF_LOG=WARN terraform0.13.6 apply
...
ovh_cloud_project_kube.cluster: Still creating... [2m40s elapsed]
2021/06/24 14:50:50 [WARN] Provider "registry.terraform.io/ovh/ovh" produced an unexpected new value for ovh_cloud_project_kube.cluster, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .private_network_id: was null, but now cty.StringVal("")
      - .version: was null, but now cty.StringVal("1.20")
ovh_cloud_project_kube.cluster: Creation complete after 2m47s [id=81f7bb07-ac6a-43e6-902f-343fa287bb83]
2021-06-24T14:50:50.300Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

kube_version = 1.20

BTW, i don't get why it tells "the provider uses the legacy sdk", while it uses the sdk/v2 ?

@jbardin
Copy link
Member

jbardin commented Jun 24, 2021

From Terraform's perspective, the plugin sdk/v2 is still the same legacy SDK, it just contains some backwards incmpatible changes from the original v1 SDK. It cannot make use of the full type system used in Terraform, and will produce certain inconsistencies that cannot be avoided, which is why we must allow these results and can only log them as warnings.

This is likely a result of the more precise planning that is available in 0.14, which is being tripped up by the unexpected change, but I'm not sure how exactly that is happening. We can see that although terraform does not expect the version value to change, we do accept it and it should be stored in the state. This means that the resource could be removing the version value when it is refreshed, causing the plan to show another change, or it could be updating the version altogether, which would also force replacement. There may be more information in the logs from the following plan, which is where the resource shows it must be replaced.

@yanndegat
Copy link
Contributor Author

From Terraform's perspective, the plugin sdk/v2 is still the same legacy SDK, it just contains some backwards incmpatible changes from the original v1 SDK. It cannot make use of the full type system used in Terraform, and will produce certain inconsistencies that cannot be avoided, which is why we must allow these results and can only log them as warnings.

ok. so what's the new non legacy sdk ?

This is likely a result of the more precise planning that is available in 0.14, which is being tripped up by the unexpected change, but I'm not sure how exactly that is happening. We can see that although terraform does not expect the version value to change, we do accept it and it should be stored in the state. This means that the resource could be removing the version value when it is refreshed, causing the plan to show another change, or it could be updating the version altogether, which would also force replacement. There may be more information in the logs from the following plan, which is where the resource shows it must be replaced.

ok. I'm not sure to get it properly. Said another way, how can we make an attribute optional, and if unset, let the api return the default value ? but if the user later change the attribute, then the resource is recreated ?

To me, the tuple : Optional: true, Computed: True, ForcesNew: true: should do the the trick ?

@jbardin
Copy link
Member

jbardin commented Jun 24, 2021

ok. so what's the new non legacy sdk ?

The latest SDK is being developed here: https://github.com/hashicorp/terraform-plugin-go, in conjunction with the existing SDK project here https://github.com/hashicorp/terraform-plugin-sdk.

ok. I'm not sure to get it properly. Said another way, how can we make an attribute optional, and if unset, let the api return the default value ? but if the user later change the attribute, then the resource is recreated ?

To me, the tuple : Optional: true, Computed: True, ForcesNew: true: should do the the trick ?

This is more a question for the SDK, since the handling of these is outside of Terraform. I think this combination should work as you have described in many cases, though one major problem here is that it will not let a user revert to a default with replacement. Because the value is also "computed", when the user removes the value from the configuration terraform is going to treat that as computed by the provider and not cause any changes. The new plugin protocol does now include the information for providers to manually detect this, but the existing SDK cannot access it.

Due to the complications around detecting if the user set or unset the value it's often easier to separate the configuration attributes ("inputs") from computed values ("outputs"). This way you have better control over the lifecycle of the desired version and the default.

Aside from the noted complications above however, I don't think that is the root of your problem, and this looks like it should still be working in the simple case shown as long as the provider is consistently setting the version value.

@yanndegat
Copy link
Contributor Author

yanndegat commented Jun 24, 2021

ok. i'll try to upgrade to provider to the new sdk then.

still it may require some time to integrate it properly.

Meanwhile, i don't see what i have to do to "consistently" set the version attribute.

IMHO, the https://github.com/ovh/terraform-provider-ovh/blob/master/ovh/resource_cloud_project_kube.go#L143
is pretty dumb.

there's only one create method, which always returns the "read" method.
the read method always returns the version.

i'll try to run it in debug to see if the api is always returning the value.

@apparentlymart
Copy link
Contributor

apparentlymart commented Jun 25, 2021

This does seem like a rather baffling behavior, indeed. Unfortunately it seems to be sitting right at the integration point between Terraform Core and the SDK, so it's kinda hard to think through what's going on here, but after some thought experiment I think this behavior must be something odd happening inside the SDK rather than in Terraform Core.

To try to illustrate why I'll summarize my understanding of what Terraform Core would be doing for this second terraform plan, based on the resource instance change lifecycle:

  • Terraform will first evaluate the configuration and see that no value is assigned to version, and thus leave it set to null in the object that represents the configuration alone. (That's the object that the resource instance change docs call "Configuration".)
  • Terraform will then retrieve the data saved for this resource instance from the prior state, obtaining what the docs call "Prior State". Terraform invites the provider to "upgrade" an object to the latest schema version if needed, but this resource type doesn't seem to have any upgrade rules so we'll ignore that here.
  • Terraform will then prepare the "proposed new state" by merging the previous two artifacts. The merge rule is that if a Computed: true attribute is null in configuration then the proposed new state takes the value from the prior state, so that the default proposal is to retain that previous value.
  • Finally, Terraform sends all three of those artifacts over to the provider PlanResourceChange function, and it's the provider's job to digest them in whatever way it sees fit and return the "planned new state", which is what Terraform will ultimately report onscreen as part of the plan diff.

Your particular resource type doesn't implement CustomizeDiff, so in your case PlanResourceChange would be handled entirely by the default logic in the SDK, and therefore I can't see any obvious way that any code you wrote could've caused this odd behavior.

Unfortunately this is where things get pretty murky, because the main logic in the SDK is schemaMap.Diff which is implemented in terms of some SDK-specific types that are derived from the three artifacts I mentioned above but have been transformed in various ways to meet the SDK's assumptions, which we can see in the SDK's implementation of PlanResourceChange. There's too much going on there for me to walk through it as a thought experiment. 😖

One thing that could be interesting to try to gather some more information is to temporarily add a CustomizeDiff implementation to your resource type that doesn't actually make any modifications but does inspect and log what version is set to at the point of calling CustomizeDiff, which is after the SDK runs its own built-in diff rules. Something like this, perhaps:

    CustomizeDiff: func (ctx context.Context, d *schema.ResourceDiff, meta interface{}) error {
        old, new := d.GetChange("version")
        log.Printf("[DEBUG] version in diff is %q -> %q", old, new)
        return nil
    }

Given what you initially reported, I would expect this to log the following:

[DEBUG] version in diff is "1.20" -> ""

If you see something different, that could indicate that I've been wrong in my assumptions so far and the problem may lurk somewhere other than where I think it does. Otherwise though, I think we'll probably end up referring this to the SDK team, who can hopefully offer a more complete explanation of the SDK side of the behavior here, from which we may be able to identify an alternative approach that avoids the problem.

@yanndegat
Copy link
Contributor Author

hi @apparentlymart / @jbardin

i finally found the root cause of my issue. explaining why i've been misleaded several times.
there's been an issue during the release of the v0.13.0 version of the provider.

it has been released 2 times:

once without the patch fixing the version, once with it. But the second release failed to update the registry and overwrite the first one.
so the v0.13.0 tag on the repository is not reflecting the produced binary on the registry.

thank you very much for the time you spent trying to help us.
and sorry for all that noise.

regards

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug new new issue not yet triaged
Projects
None yet
Development

No branches or pull requests

3 participants