bug: rounding large int values from state #655

slevenick · 2020-12-08T22:10:06Z

Terraform Version

Terraform v0.14.0

Terraform Configuration Files

Nothing here is particularly important, only that google_compute_url_map stores a large value in a TypeInt. This value is Computed and is not changing between calls of terraform plan

resource "google_compute_url_map" "default" {
  name        = "url-map"
  description = "Description"

  default_service = google_compute_backend_bucket.static.id
  default_route_action {
    url_rewrite {
      host_rewrite = "REDACTED"
    }
  }
}

resource "google_compute_backend_bucket" "static" {
  name        = "static-asset-backend-bucket"
  bucket_name = google_storage_bucket.static.name
  enable_cdn  = true
}

resource "google_storage_bucket" "static" {
  name     = "static-bucket${random_integer.random.result}"
  location = "US"
}

resource "random_integer" "random" {
  min = 10000
  max = 99999
}

Debug Output

Key line:

2020/12/08 14:05:12 [WARN] Provider "registry.terraform.io/hashicorp/google" produced an unexpected new value for google_compute_url_map.default during refresh.
      - .map_id: was cty.NumberIntVal(7.227701560655104e+18), but now cty.NumberIntVal(7.227701560655103598e+18)

This turns into an error at apply-time

Crash Output

Expected Behavior

TypeInt value is read from state without rounding

Actual Behavior

TypeInt value is read from state with rounding

Value in state:
"map_id": 7227701560655103598,

Value returned from d.Get("map_id"):
7227701560655104000

Steps to Reproduce

terraform apply
terraform plan

Shows the error in debug logs. Changing the description of the URL map causing an update and then running terraform apply causes Terraform to error with:

Error: Provider produced inconsistent final plan

When expanding the plan for google_compute_url_map.default to include new
values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.map_id: was cty.NumberIntVal(7.227701560655103598e+18), but now
cty.NumberIntVal(7.227701560655104e+18).

Additional Context

I have tracked this down to a change in state between two runs of terraform apply. The value is set correctly in state and is written to the state file correctly, but seems to round when the value is read by the next invocation of terraform plan/apply

This behavior was introduced in 0.14.0, as these resources did not have a problem in the past.

References

The text was updated successfully, but these errors were encountered:

apparentlymart · 2020-12-09T01:36:39Z

Hi @slevenick! Thanks for reporting this.

Internally within Terraform itself, numbers are big.Float with a precision that would be more than enough for these values, but it does seem like these would be out of the range of a float64 because it requires a significand of more than 52 bits.

For that reason, my initial guess here would be that something in the Terraform SDK is round-tripping these values through a float64. Most of the SDK logic uses int64 for TypeInt, so this is still pretty surprising, but one place I know that the SDK does end up dealing with the raw JSON data from the state is in the schema upgrade codepath, because the provider itself is responsible for interpreting the stored values against the old schema -- Terraform Core only has access to the current schema.

A little digging led me to a direct json.Unmarshal to interface{} in the upgrade codepath, so this would be my initial suspect for the cause: without any special settings, Go's JSON decoder will decode all numbers into float64 values.

I've not proven that yet, but if that is the cause then this is something we'll need to tackle in the SDK repository rather than in this repository. I think to be robust there the SDK logic would need to decode into a json.Number instead -- which is really just a string containing the raw decimal representation from the input -- and then conditionally parse that either to an int64 or a float64 depending on whether the schema specifies schema.TypeInt or schema.TypeFloat, so that the supported ranges will match what the SDK normally supports for those schema types.

apparentlymart · 2020-12-09T01:39:37Z

A further thing I intended to mention in the previous comment: if this is indeed an SDK bug then the Terraform Core version shouldn't make any difference here. Instead, upgrading the provider would be the most likely trigger, because the provider might've introduced a new schema version for google_compute_url_map and thus the UpgradeResourceState operation would end up visiting the codepath I indicated in the previous comment.

apparentlymart · 2020-12-09T01:45:01Z

Looking in the implementation of google_compute_url_map I see that it doesn't seem to have any upgrade rules defined, so I guess a good question to answer when we have time to dig into this more is: does the codepath I indicated run even if there isn't an upgrade to do? If not, that would be a good way to eliminate this theory from consideration.

paddycarver · 2020-12-09T12:46:07Z

Wild guess here, but is it possible that 0.14 is running UpgradeResourceState more frequently than 0.13 did? I've heard some reports that I've not been able to substantiate that providers that do not implement UpgradeResourceState and don't have any state upgrades in place run into problems with plugin-go, and I suppose one (again, reiterating that I'm guessing) plausible explanation is that 0.14 calls UpgradeResourceState even when the state does not need upgrading sometimes, which I think would explain how this showed up in 0.14 but not 0.13.

terraform-plugin-go uses json.Number and big.Float everywhere, I believe. terraform-plugin-sdk is a mess of different number types and that has bitten us before, so it would not surprise me that we're using a float64 where it's not optimal to be. The engineering lift to fix it wholesale is, unfortunately, pretty heavy because of compatibility concerns, but we may be able to get away with it.

We can transfer this issue to the SDK repo, we can leave it here until we ascertain why it suddenly showed up in 0.14, or we can transfer to the SDK and open an issue here to get to the bottom of what changed in 0.14 to prompt it, I'm fine with any of these approaches.

jbardin · 2020-12-09T13:33:24Z

UpgradeResourceState is always called as part of preparing the state value for use. This must be done because the legacy providers still use the call to fixup state without bumping the schema version. It makes sense that the upgrade path using the json.Unmarshal defaults would evaluate the number as a double.

paultyng · 2020-12-09T13:38:27Z

Going to move this to the SDK, I believe I've replicated this in terraform-provider-random on 0.12, 0.13, and 0.14. Still could be something in core but will run down at least all the SDK avenues here first.

paultyng · 2020-12-09T13:57:33Z

@slevenick as a workaround for now, if you have any numbers that go over 64 bits (or approach 64 bits) I'd recommend strings in the meantime.

cagataygurturk · 2020-12-11T10:38:41Z

I don't know if it is the same issue but with 0.14 I am getting these errors for a GKE cluster:

2020/12/11 11:28:32 [WARN] Provider "registry.terraform.io/hashicorp/google-beta" produced an unexpected new value for module.gke_cluster_europe_west3.google_container_cluster.primary during refresh.
      - .master_version: was cty.StringVal("1.17.13-gke.1401"), but now cty.StringVal("1.17.13-gke.2001")
      - .node_version: was cty.StringVal("1.17.13-gke.1400"), but now cty.StringVal("1.17.13-gke.2001")
      - .node_pool[0].node_count: was cty.NumberIntVal(0), but now cty.NumberIntVal(1)
      - .node_pool[0].version: was cty.StringVal("1.17.13-gke.1400"), but now cty.StringVal("1.17.13-gke.2001")
      - .node_pool[1].version: was cty.StringVal("1.17.13-gke.1400"), but now cty.StringVal("1.17.13-gke.2001")

and plan/apply fails. 13.5 works fine.

slevenick · 2020-12-11T17:15:10Z

I don't know if it is the same issue but with 0.14 I am getting these errors for a GKE cluster:

2020/12/11 11:28:32 [WARN] Provider "registry.terraform.io/hashicorp/google-beta" produced an unexpected new value for module.gke_cluster_europe_west3.google_container_cluster.primary during refresh.
      - .master_version: was cty.StringVal("1.17.13-gke.1401"), but now cty.StringVal("1.17.13-gke.2001")
      - .node_version: was cty.StringVal("1.17.13-gke.1400"), but now cty.StringVal("1.17.13-gke.2001")
      - .node_pool[0].node_count: was cty.NumberIntVal(0), but now cty.NumberIntVal(1)
      - .node_pool[0].version: was cty.StringVal("1.17.13-gke.1400"), but now cty.StringVal("1.17.13-gke.2001")
      - .node_pool[1].version: was cty.StringVal("1.17.13-gke.1400"), but now cty.StringVal("1.17.13-gke.2001")

and plan/apply fails. 13.5 works fine.

That would be surprising, and a different but potentially related issue. Those values are strings and shouldn't have the rounding problem that we are seeing with large ints.

This looks like the GKE version has changed, but I'm not sure why that would be different between 0.14 and 0.13x. When you plan/apply on 0.13 does it show any changes?

Can you include debug logs for the failing plan? And potentially values in state/config for the master_version and other failing fields

paultyng · 2020-12-11T20:24:24Z

@cagataygurturk if you are using ignore_changes, there was a regression in 0.14.0 that has been fixed in more recent 0.14 releases.

Sytten · 2020-12-14T22:23:43Z

@paultyng @slevenick What is the plan to fix this? Can this be reverted to 0.13 behaviour?

paddycarver · 2020-12-15T00:40:23Z

I believe our current understanding is this is the 0.13 behavior, though I have no compelling answer as to why it suddenly became an issue.

I'm hoping to get a fix out in the next few days, if possible. Providers would then need to update their SDK versions and ship new releases.

paultyng · 2020-12-15T01:35:05Z

I was able to verify this in 0.12.29 (the oldest I tested):

PS C:\Users\paul\Sandbox\bigints> terraform version
Terraform v0.12.29
+ provider.random v3.0.0

Your version of Terraform is out of date! The latest version
is 0.14.2. You can update by downloading from https://www.terraform.io/downloads.html
PS C:\Users\paul\Sandbox\bigints> cat main.tf
resource "random_integer" "bigint" {
    min = 7227701560655103597
    max = 7227701560655103598
}
PS C:\Users\paul\Sandbox\bigints> terraform apply

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # random_integer.bigint will be created
  + resource "random_integer" "bigint" {
      + id     = (known after apply)
      + max    = 7227701560655103598
      + min    = 7227701560655103597
      + result = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

random_integer.bigint: Creating...
random_integer.bigint: Creation complete after 0s [id=7227701560655103598]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
PS C:\Users\paul\Sandbox\bigints> terraform apply
random_integer.bigint: Refreshing state... [id=7227701560655103598]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # random_integer.bigint must be replaced
-/+ resource "random_integer" "bigint" {
      ~ id     = "7227701560655103598" -> (known after apply)
      ~ max    = 7227701560655104000 -> 7227701560655103598 # forces replacement
      ~ min    = 7227701560655104000 -> 7227701560655103597 # forces replacement
      ~ result = 7227701560655104000 -> (known after apply)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

random_integer.bigint: Destroying... [id=7227701560655103598]
random_integer.bigint: Destruction complete after 0s
random_integer.bigint: Creating...
random_integer.bigint: Creation complete after 0s [id=7227701560655103598]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.
PS C:\Users\paul\Sandbox\bigints>

See hashicorp/terraform-plugin-sdk#655. Essentially, test that integers that do not fit cleanly in float64s still retain their precision. These tests currently fail.

paddycarver · 2020-12-16T08:40:33Z

OK so I've done a lot of digging into this.

So fundamentally, the cause of the issue here was the liberal use of JSON encoding and decoding that the SDK does. Why we do this is a longer story and requires more context, but fundamentally JSON is used as sort of a babel-representation of different systems and so when converting between, e.g., 0.11 and 0.12's types, sometimes we just encode to JSON and decode back out.

Unfortunately, it is not always the case that our decoding uses the UseNumber() method, more frequently just reaching for the default json.Unmarshal... which decodes numbers into float64 when unmarshaling into an interface{}, which we do quite a bit of the time.

This is, usually, fine. However, for sufficiently large numbers (I believe the ceiling is 17 significant figures, but could be wrong about that) float64 can't actually represent them with precision, so it... silently rounds them. Whoops.

This is all basically what Martin said a week ago.

The fix here is straightforward technically, but messy when people get involved. We can just switch to using a JSON decoder with UseNumber() called and this specific problem goes away.

There are a couple of compatibility concerns, though. First, the use of ResourceData.Get and then casting the result to a specific type is an unavoidable pattern that every provider utilises. This means us changing the underlying type breaks compatibility, because any code using the old type will panic. Oops. Now, given that you can use ResourceData.Get and cast to an integer or a float depending on whether you're using schema.TypeInt or schema.TypeFloat gives me hope that we could potentially smooth this over a bit, again as Martin suggested a week ago (this comment is largely just "Martin was right a week ago, but now I have the code and tests to prove it"). Inordinately large TypeFloats would continue having a bad time, but large TypeInts would have less of a bad time, until they hit 64 bits and start having a real bad time again. But I still am researching and investigating the TypeFloat/TypeInt conversion and casting logic to see if we can even do this. If we can't, a boolean may need to be set on the schema.Resource, opting into the new behavior, which would be sad but would still offer a workaround.

Second, the helper/resource test harness also has a lot of exposure to JSON, and also doesn't always use UseNumber. It, similarly, needs to be upgraded, or things will work but the tests will insist there's a problem. (I lost several hours to investigating why my fix wasn't working, only to realise I hadn't fixed the test harness. Whoops.) This also needs to be done in such a way that users don't have to change their code, and I'm still investigating the impact of the needed changes (we will also need to change terraform-json and terraform-exec) to determine whether they're backwards compatible.

All of this to say: the problem is known (I still don't particularly understand why 0.14 exacerbated this; or if it didn't, as seems plausible given we have multiple reproductions of it in 0.12 and 0.13, why it hasn't been an issue before now) and we're making progress on solutions, fixing this just involves moving some pieces that our ecosystem relies directly on, and so just like replacing a load-bearing pillar, we're going to need to move carefully and that takes a little bit of time. But we are working on it.

marekaf · 2020-12-16T12:54:29Z

Hi,
is there any known workaround? We upgraded our state to terraform 0.14 and not we are stuck with this problem. Can't downgrade to 0.13 either.
Thanks!

Go's JSON decoder, by default, uses a lossy conversion of JSON integers to float64s. For sufficiently large integers, this yields a loss of precision, and that causes problems with diffs and plans not matching. See #655 for more details. This PR proposes a solution: keeping the lossless json.Number representation of integers in state files. This will ensure that no precision is lost, but it comes at a cost. json.Number is a string type, not a float64 type. That means that existing state upgraders that (correctly) cast a value to float64 will break, as the value will now be surfaced to them as a json.Number, which cannot be cast to a float64. To handle this, the schema.Resource type gains a `UseJSONNumber` property. When set to new, users are opted into the new json.Number values. When left false, the default, users get the existing float64 behavior. If a resource with state upgraders wants to use the new json.Number behavior, it must update all the state upgraders for that resource to use json.Number instead of float64 or users with old state files will panic during upgrades. Note: the backwards compatibility properties of this commit are unknown at this time, and there may be a way for us to avoid using schema.Resource.UseJSONNumber, instead preserving the choice until we get to the point where TypeInt casts the number to an int. Further investigation needed.

paultyng · 2020-12-17T14:52:16Z

@marekaf as mentioned above, this doesn't seem to be limited to 0.14 and seems to be affecting 0.13 and 0.12 since it is provider side. You could try to downgrade your provider (not the CLI), and see if older versions can handle this, they would need to be on the v1 SDK, as opposed to the v2 SDK I think, and I'm not sure where that switched in all providers.

15-17 integer digits is about the most you can safely carry in a float64 without any potential loss and this ID has crossed that threshold so it will occasionally have losses until the SDK fix is merged in to the provider or the typing is changed.

marekaf · 2020-12-17T14:56:32Z

@marekaf as mentioned above, this doesn't seem to be limited to 0.14 and seems to be affecting 0.13 and 0.12 since it is provider side. You could try to downgrade your provider (not the CLI), and see if older versions can handle this, they would need to be on the v1 SDK, as opposed to the v2 SDK I think, and I'm not sure where that switched in all providers.

15-17 integer digits is about the most you can safely carry in a float64 without any potential loss and this ID has crossed that threshold so it will occasionally have losses until the SDK fix is merged in to the provider or the typing is changed.

thanks!

I managed to work around it by just making sure the apply will not touch url_map (not possible via -targeting so I had to simply change manually in GCP web console everything to match the config in terraform hcl so that not drift would be detected). Everything else works for plan & apply.

Go's JSON decoder, by default, uses a lossy conversion of JSON integers to float64s. For sufficiently large integers, this yields a loss of precision, and that causes problems with diffs and plans not matching. See #655 for more details. This PR proposes a solution: keeping the lossless json.Number representation of integers in state files. This will ensure that no precision is lost, but it comes at a cost. json.Number is a string type, not a float64 type. That means that existing state upgraders that (correctly) cast a value to float64 will break, as the value will now be surfaced to them as a json.Number, which cannot be cast to a float64. To handle this, the schema.Resource type gains a `UseJSONNumber` property. When set to new, users are opted into the new json.Number values. When left false, the default, users get the existing float64 behavior. If a resource with state upgraders wants to use the new json.Number behavior, it must update all the state upgraders for that resource to use json.Number instead of float64 or users with old state files will panic during upgrades. Note: the backwards compatibility properties of this commit are unknown at this time, and there may be a way for us to avoid using schema.Resource.UseJSONNumber, instead preserving the choice until we get to the point where TypeInt casts the number to an int. Further investigation needed.

paddycarver · 2020-12-19T00:41:13Z

This should be resolved in 2.4.0 that is rolling out now. Resources impacted will need to set UseJSONNumber: true on the schema.Resource and any StateUpgraders on the resource will need to use json.Number instead of float64 for all numbers. Other than that, it should be a transparent upgrade process with no code changes needed.

Abdelwaheb-Hnaien · 2020-12-21T11:04:07Z

I'm having the same issue with google_compute_url_map resource since I upgraded to terraform v0.14.0
I created the url_map in my project using terraform v0.13.5
what I'm trying to do now is to make the url_map point to a new backend service (previously an instance group, now a bucket) and then I get this error:

Error: Provider produced inconsistent final plan


When expanding the plan for
module.google_compute_url_map_uat-mcm-url-map.google_compute_url_map.urlmap to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.map_id: was cty.NumberIntVal(6.438816093867301036e+18), but now
cty.NumberIntVal(6.438816093867301e+18).

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

I also tried with terraform v0.14.3, it's not working also.

this action (changing the backend) was working perfectly with terraform version 0.13.5

Prior to this change, integers in state that do not fit cleanly in float64s lose their precision, leading to permadiffs. As of SDK v2.4.0, setting UseJSONNumber on the resource schema changes number representation to json.Number throughout, preserving precision. Please see hashicorp/terraform-plugin-sdk#655 for more information.

kmoe · 2021-01-05T15:44:53Z

I've updated the title of this issue to prevent confusion when referring to it in the future, as this is not a 0.14 bug.

ghost · 2021-01-18T01:50:01Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

paultyng transferred this issue from hashicorp/terraform Dec 9, 2020

paddycarver added the bug Something isn't working label Dec 9, 2020

paultyng linked a pull request Dec 9, 2020 that will close this issue

Fix large number issues in upgrade state #656

Closed

Sytten mentioned this issue Dec 10, 2020

Changing SSL certificates in google_compute_target_https_proxy results in inconsistent final plan from .proxy_id hashicorp/terraform-provider-google#7966

Closed

paddycarver mentioned this issue Dec 16, 2020

Add test cases for ints too large for float64. hashicorp/terraform-provider-corner#16

Merged

jbardin mentioned this issue Dec 17, 2020

Float number are used in diff hashicorp/terraform#27305

Closed

paddycarver mentioned this issue Dec 18, 2020

Add a flag for using json.Number from state #662

Merged

paddycarver closed this as completed Dec 19, 2020

thilohaas mentioned this issue Dec 21, 2020

Weird state error with TF 0.14.0 on changes hashicorp/terraform-provider-google#7945

Closed

This was referenced Dec 28, 2020

Update SDK to 2.4.0 hashicorp/terraform-provider-google#8088

Merged

Update SDK to 2.4.0 hashicorp/terraform-provider-google-beta#2817

Merged

rileykarson mentioned this issue Dec 28, 2020

Terraform can't update Host and path rules in LoadBalancer (NEG) hashicorp/terraform-provider-google#8074

Closed

kmoe mentioned this issue Jan 5, 2021

Fix bigint bug hashicorp/terraform-provider-random#132

Merged

kmoe changed the title ~~0.14.0 rounding large int values from state~~ bug: rounding large int values from state Jan 5, 2021

ghost locked as resolved and limited conversation to collaborators Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: rounding large int values from state #655

bug: rounding large int values from state #655

slevenick commented Dec 8, 2020

apparentlymart commented Dec 9, 2020

apparentlymart commented Dec 9, 2020

apparentlymart commented Dec 9, 2020

paddycarver commented Dec 9, 2020

jbardin commented Dec 9, 2020

paultyng commented Dec 9, 2020

paultyng commented Dec 9, 2020

cagataygurturk commented Dec 11, 2020

slevenick commented Dec 11, 2020 •

edited

Loading

paultyng commented Dec 11, 2020 •

edited

Loading

Sytten commented Dec 14, 2020

paddycarver commented Dec 15, 2020

paultyng commented Dec 15, 2020

paddycarver commented Dec 16, 2020

marekaf commented Dec 16, 2020

paultyng commented Dec 17, 2020 •

edited

Loading

marekaf commented Dec 17, 2020

paddycarver commented Dec 19, 2020

Abdelwaheb-Hnaien commented Dec 21, 2020 •

edited

Loading

kmoe commented Jan 5, 2021

ghost commented Jan 18, 2021

bug: rounding large int values from state #655

bug: rounding large int values from state #655

Comments

slevenick commented Dec 8, 2020

Terraform Version

Terraform Configuration Files

Debug Output

Crash Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Additional Context

References

apparentlymart commented Dec 9, 2020

apparentlymart commented Dec 9, 2020

apparentlymart commented Dec 9, 2020

paddycarver commented Dec 9, 2020

jbardin commented Dec 9, 2020

paultyng commented Dec 9, 2020

paultyng commented Dec 9, 2020

cagataygurturk commented Dec 11, 2020

slevenick commented Dec 11, 2020 • edited Loading

paultyng commented Dec 11, 2020 • edited Loading

Sytten commented Dec 14, 2020

paddycarver commented Dec 15, 2020

paultyng commented Dec 15, 2020

paddycarver commented Dec 16, 2020

marekaf commented Dec 16, 2020

paultyng commented Dec 17, 2020 • edited Loading

marekaf commented Dec 17, 2020

paddycarver commented Dec 19, 2020

Abdelwaheb-Hnaien commented Dec 21, 2020 • edited Loading

kmoe commented Jan 5, 2021

ghost commented Jan 18, 2021

slevenick commented Dec 11, 2020 •

edited

Loading

paultyng commented Dec 11, 2020 •

edited

Loading

paultyng commented Dec 17, 2020 •

edited

Loading

Abdelwaheb-Hnaien commented Dec 21, 2020 •

edited

Loading