Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestJobQueueUpgrade panics under PlanResourceChange #4015

Closed
t0yv0 opened this issue Jun 3, 2024 · 7 comments · Fixed by #4019
Closed

TestJobQueueUpgrade panics under PlanResourceChange #4015

t0yv0 opened this issue Jun 3, 2024 · 7 comments · Fixed by #4019
Assignees
Labels
kind/engineering Work that is not visible to an external user resolution/fixed This issue was fixed

Comments

@t0yv0
Copy link
Member

t0yv0 commented Jun 3, 2024

The crux of the issue:


  "method": "/pulumirpc.ResourceProvider/Diff",
  "request": {
    "id": "terraform-20231205221555324200000003",
    "urn": "urn:pulumi:test::job-queue::aws:batch/computeEnvironment:ComputeEnvironment::sampleComputeEnvironment",
    "olds": {
      "arn": "arn:aws:batch:us-west-2:616138583583:compute-environment/terraform-20231205221555324200000003",
      "computeResources": {
        "ec2Configuration": {
          "imageIdOverride": "",
          "imageType": "ECS_AL2"
        },

Note that there is a problem with aws:batch/computeEnvironment:ComputeEnvironment resource where the 5.42.0 provider
wrote an object for ec2Configuration when a list is expected by the new version, as is also reflected in the Pulumi
schema.

So when upgrading a stack with ComputeEnvironment from 5.42.0 to the latest provider there is an implicit type mismatch.

There appears to be a problem where PlanResourceChange makes it a louder failure than the current silent failure.

Both PlanResourceChange=true and PlanResourceChange=false run state upgraders (although the current and desired version
is 0), and ironically the resource does not define any state upgraders at all.

Note that the state cty.Value passed to the state upgraders differs:

PlanResourceChange=true

panic: UPGRADING GRPC RAW=cty.TupleVal([]cty.Value{
   cty.ObjectVal(map[string]cty.Value{"allocation_strategy":cty.StringVal(""), "bid_percentage":cty.NumberIntVal(0), "desired_vcpus":cty.NumberIntVal(0),
      "ec2_configuration":cty.ObjectVal(map[string]cty.Value{"image_id_override":cty.StringVal("[]"), "image_type":cty.StringVal("ECS_AL2")}), "ec2_key_pair":cty.StringVal(""), "image_id":cty.StringVal(""), "instance_role":cty.StringVal("arn:aws:iam::616138583583:instance-profile/ecsInstanceRoleInstanceProfile-912858a"), "instance_type":cty.TupleVal([]cty.Value{cty.StringVal("c4.large")}), "max_vcpus":cty.NumberIntVal(16), "min_vcpus":cty.NumberIntVal(0), "security_group_ids":cty.TupleVal([]cty.Value{cty.StringVal("sg-0bb51dcade107992e")}), "spot_iam_fleet_role":cty.StringVal(""), "subnets":cty.TupleVal([]cty.Value{cty.StringVal("subnet-07f3a862bd9c5f1a3")}), "tags":cty.EmptyObjectVal, "type":cty.StringVal("EC2")})}) [recovered]

PlanResourceChange=false

panic: UPGRADING GRPC RAW=cty.ListVal([]cty.Value{
   cty.ObjectVal(map[string]cty.Value{"allocation_strategy":cty.StringVal(""), "bid_percentage":cty.NumberIntVal(0), "desired_vcpus":cty.NumberIntVal(0),
       "ec2_configuration":cty.ListValEmpty(cty.Object(map[string]cty.Type{"image_id_override":cty.String, "image_type":cty.String})), "ec2_key_pair":cty.StringVal(""), "image_id":cty.StringVal(""), "instance_role":cty.StringVal("arn:aws:iam::616138583583:instance-profile/ecsInstanceRoleInstanceProfile-912858a"), "instance_type":cty.SetVal([]cty.Value{cty.StringVal("c4.large")}), "launch_template":cty.ListValEmpty(cty.Object(map[string]cty.Type{"launch_template_id":cty.String, "launch_template_name":cty.String, "version":cty.String})), "max_vcpus":cty.NumberIntVal(16), "min_vcpus":cty.NumberIntVal(0), "placement_group":cty.StringVal(""), "security_group_ids":cty.SetVal([]cty.Value{cty.StringVal("sg-0bb51dcade107992e")}), "spot_iam_fleet_role":cty.StringVal(""), "subnets":cty.SetVal([]cty.Value{cty.StringVal("subnet-07f3a862bd9c5f1a3")}), "tags":cty.MapValEmpty(cty.String), "type":cty.StringVal("EC2")})})

Note that PlanResourceChange=false elegantly bypassed the problem by simply ignoring the data it can't read in
ec2_configuration.

However for PlanResoruceChange=true the TupleVal is hiding an underlying failure of recoverCtyValue:

recoverCtyValue fails:
panic: FAILED: recoverCtyValue failed: Cannot reconcile map map[image_id_override:[] image_type:ECS_AL2] to cty.List(cty.Object(map[string]cty.Type{"image_id_override":cty.String, "image_type":cty.String})) [recovered]

Which makes sense, it failed to read the object into a slot of type list. Then it decided to use approximate methods to
recover the state, which yielded the TupleVal combined with an ec2_configuration that is now an ObjectVal which is not
expected.

Threading this data through resource upgraders trips up JSONMapToStateValue which responds with a very unhelpful error
message: "missing expected [".

There is another issue where the bridge fails to handle the error diagnostics from the state upgraders, and therefore
panics instead of showing this error.

	github.com/pulumi/pulumi-terraform-bridge/v3 v3.83.1-0.20240603141117-4503342dc4c8
@t0yv0 t0yv0 added the kind/engineering Work that is not visible to an external user label Jun 3, 2024
@t0yv0
Copy link
Member Author

t0yv0 commented Jun 3, 2024

@VenelinMartinov @iwahbe

@t0yv0
Copy link
Member Author

t0yv0 commented Jun 3, 2024

I think this is an implicit schema change for aws:batch/computeEnvironment:ComputeEnvironment ec2Configuration at Pulumi provider level. This schema change was not present in TF proper. Perhaps there is a bridge mechanism to compensate and normalize the state for this resource?

@iwahbe
Copy link
Member

iwahbe commented Jun 3, 2024

You can override TransformFromState to fix the issue for just this resource.

@t0yv0
Copy link
Member Author

t0yv0 commented Jun 3, 2024

That's right. I'll have a look at that indeed. For the bridge, submitting a quick fix to handle diagnostics in upgrade state.

We can also consider soft-reporting errors from state upgrades, either always, or when desired_version=current_version.

@t0yv0
Copy link
Member Author

t0yv0 commented Jun 3, 2024

@VenelinMartinov
Copy link
Contributor

I wonder why the new type checking does not catch this?

t0yv0 added a commit to pulumi/pulumi-terraform-bridge that referenced this issue Jun 4, 2024
Isolating this from an AWS test failure; handling diags turns a panic
into a regular error. Looks like the error message is still unfortunate.
The test is isolated from
pulumi/pulumi-aws#4015
@t0yv0
Copy link
Member Author

t0yv0 commented Jun 4, 2024

It's not intended to check state. It would be wrong to do so in fact since the state may have been written at a different provider version. This is like pulumi/pulumi-terraform-bridge#1667 but with an additional plot twist that TF schema did not change (1667 does not apply) but Pulumi schema did.

t0yv0 added a commit that referenced this issue Jun 4, 2024
t0yv0 added a commit that referenced this issue Jun 5, 2024
t0yv0 added a commit that referenced this issue Jun 5, 2024
…4019)

Although the upstream schema of batch.ComputeEnvironment did not change
between v5 and v6 of pulumi-aws, the Pulumi schema did, and
computeResources.ec2Configuration is now an array. Prior to this change
pulumi-aws v6 was not able to read the states written by the v5 of the
provider correctly, if those states did involve a non-nil
computeResources.ec2Configuration.

In some of the recent bridge work this also interacted poorly with the
new PlanResourceChange machinery causing panics.

The change makes the provider automatically apply `x -> [x]` conversions
to computeResources.ec2Configuration state.

Fixes #4015
@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/engineering Work that is not visible to an external user resolution/fixed This issue was fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants