Reproduce aws-3880 in a test #1917

t0yv0 · 2024-04-30T19:39:27Z

Toward a min-repro for pulumi/pulumi-aws#3880

WebACL Rule attribute has a complicated schema and it looks like something is getting distorted under the Pulumi translation with no user changes, so that a different set element identity is computed under Pulumi during the first and the second pulumi up.

This resource is already under PlanResourceChange flag.

Narrowing it down from here.

t0yv0 · 2024-04-30T19:40:28Z

pkg/tests/cross-tests/adapt.go

@@ -102,17 +102,23 @@ func (ta *typeAdapter) NewValue(value any) tftypes.Value {
 		switch v := value.(type) {
 		case map[string]any:
 			values := map[string]tftypes.Value{}
-			for k, el := range v {
-				values[k] = fromType(aT[k]).NewValue(el)
+			for key, expectedType := range aT {


Noted that tftypes.Value representation insists (via panics) that every attribute has an entry, even if it's a nil entry. It also insisted on no optional attributes. This is now the case in this adapter, it frees the test writer from writing explicit nulls for attributes that do not matter.

t0yv0 · 2024-04-30T19:41:39Z

pkg/tests/internal/webaclschema/webacl.go

-						}
-						return n
-					},
+					// Set: func(v interface{}) int {


This was really here for debugging only. The original in AWS does not specify a custom Set.

t0yv0 · 2024-04-30T19:42:11Z

pkg/tests/cross-tests/tfwrite.go

-				for _, v := range value.AsValueSet().Values() {
-					newBlock := body.AppendNewBlock(key, nil)
-					writeBlock(newBlock.Body(), elem.Schema, v.AsValueMap())
+				if !value.IsNull() {


Tolerate missing values encoded by nulls - as required by tftypes.Value (and translated to cty.Value).

t0yv0 · 2024-04-30T19:42:39Z

pkg/tests/cross-tests/diff_check.go

@@ -58,8 +58,8 @@ func runDiffCheck(t T, tc diffTestCase) {
 	tfwd := t.TempDir()

 	tfd := newTfDriver(t, tfwd, providerShortName, rtype, tc.Resource)
-	_ = tfd.writePlanApply(t, tc.Resource.Schema, rtype, "example", tc.Config1)
-	tfDiffPlan := tfd.writePlanApply(t, tc.Resource.Schema, rtype, "example", tc.Config2)
+	_ = tfd.writePlanApply(t, tc.Resource.SchemaMap(), rtype, "example", tc.Config1)


Resource may specify Schema or SchemaFunc, and SchemaMap() normalizes to access either, this is the way.

t0yv0 · 2024-04-30T20:59:46Z

Alright, there is something fairly unexpected going on with what the hash function receives under TF proper for this example during a normal TF lifecycle.

I have instrumented the set function:

	// Here i may receive maps or slices over base types and *schema.Set which is not friendly to diffing.
	resource.Schema["rule"].Set = func(i interface{}) int {

These are the invocations under TF itself.

    120:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=*Set(map[string]interface {}(nil)))==> 4129856294
    124:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=*Set(map[string]interface {}(nil)))==> 4129856294
    128:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=*Set(map[string]interface {}(nil)))==> 643481770
     43:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=*Set(map[string]interface {}(nil)))==> 643481770
     49:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=*Set(map[string]interface {}(nil)))==> 643481770
     53:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=*Set(map[string]interface {}(nil)))==> 643481770
     57:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=*Set(map[string]interface {}(nil)))==> 4129856294
     61:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255
     65:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255
     69:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255
     73:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255
     77:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255

t0yv0 · 2024-04-30T21:06:10Z

Under Pulumi there is a new and exciting combination I have not seen under TF:

    254:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=<nil>)==> 835885598
    262:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=<nil>)==> 835885598
    270:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=<nil>)==> 835885598

t0yv0 · 2024-04-30T21:22:51Z

I think this new log line comes from
https://github.com/pulumi/pulumi-terraform-bridge/blob/t0yv0%2Faws-3880/pkg/tfbridge/diff.go#L105

     Error Trace:    /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tests/cross-tests/cross_test.go:508
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfshim/sdk-v2/schema.go:180
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfbridge/diff.go:120
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfbridge/diff.go:270
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfbridge/diff.go:365
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfbridge/provider.go:953
                                                        /Users/t0yv0/go/pkg/mod/github.com/pulumi/pulumi/sdk/[email protected]/proto/go/provider_grpc.pb.go:568
                                                        /Users/t0yv0/go/pkg/mod/google.golang.org/[email protected]/server.go:1386
                                                        /Users/t0yv0/go/pkg/mod/google.golang.org/[email protected]/server.go:1797
                                                        /Users/t0yv0/go/pkg/mod/google.golang.org/[email protected]/server.go:1027
                                                        /nix/store/brv7d6mlrclkzywf1vaf35wqhq4c0c82-go-1.21.5/share/go/src/runtime/asm_amd64.s:1650

This sort of makes sense, the makeDetailedDiff/visitPropertyValue call chain assumes that it can compute TF representations directly off olds, news resource.PropertyMap by looking at the re-projection of these original Pulumi values into TF domain through a conversionContext. However this is not a valid assumption to make when TF provider has made changes to the system under PlanResourceChange.

t0yv0 · 2024-04-30T21:23:32Z

Let me play a bit more to be absolutely sure that fixing this would resolve the bug.

t0yv0 · 2024-04-30T21:52:23Z

Turns out to be a red herring - not the root cause; disabling detailed diff is not affecting the test result here.

rule.643481770.action.0.captcha.# { 0 false false <nil> false false 0}
rule.4129856294.action.0.allow.# {0 0 false false <nil> false false 0}

So the root cause is closer to the other avenue - Pulumi is confusing 643481770 and 4129856294 hashes

So this is about whether custom_response is populated or not under rule[0].action.block.

"custom_response":[]interface {}{}}

t0yv0 · 2024-04-30T21:53:58Z

To be continued here - curious at which earliest point Pulumi starts to diverge from TF on the custom_response.

In the meanwhile we can try to teach the AWS provider to disregard the custom_response difference for the purposes of hashing. This might workaround the problem for this resource in particular.

VenelinMartinov · 2024-05-01T01:11:04Z

This kind of looks like #1915

t0yv0 · 2024-05-02T20:45:17Z

pkg/tfbridge/tests/provider_test.go

+		{
+		  "action": {
+		    "block": {
+		      "customResponse": null


This is the smallest repro so far. Very interesting. You can omit "customResponse" here and it still thinks this is a change.

t0yv0 · 2024-05-02T21:28:28Z

Spelunking further. The custom_resource null entry got injected here by provider2.InstanceState

InstanceState received map[id:newid rule:[map[action:[map[block:[map[]]]]]]]

cty.ObjectVal(map[string]cty.Value{
  "id":cty.StringVal("newid"),
  "rule":cty.SetVal([]cty.Value{
      cty.ObjectVal(map[string]cty.Value{
          "action":cty.ListVal([]cty.Value{
              cty.ObjectVal(map[string]cty.Value{
                  "block":cty.ListVal([]cty.Value{
                      cty.ObjectVal(map[string]cty.Value{
                          "custom_response":cty.NullVal(...)
                      })
                  })
...

t0yv0 · 2024-05-02T21:49:21Z

https://github.com/pulumi/pulumi-terraform-bridge/blob/t0yv0%2Faws-3880/pkg/tfshim/sdk-v2/upgrade_state.go#L71

Hmm, so I'm pretty sure this the place we hit. NormalizeObjectFromLegacySDK is the place where we go from custom_response=null to custom_response=[]. Subsequently there is this difference between state and plan ([] vs null). And this is what triggers incorrect hashing and cycling. This was the Pulumi side. Now let me debug the TF side and see what is received into SimpleDiff under TF.

VenelinMartinov · 2024-05-02T22:59:32Z

pkg/tests/cross-tests/cross_test.go

+	})
+}
+
+func TestAws3880Minimal(t *testing.T) {


That's quite a few levels of nesting - if required then we need to update the property based tests to go deeper - this could not have been generated there.

t0yv0 · 2024-05-02T23:26:47Z

This gets more interesting. I've instrumented resource.SimpleDiff. This is the method at which resource planning bottoms out in both scenarios.

TF SimpleDIff ##############################################################################
InstanceState.Attributes ###

id => newid
rule.# => 1
rule.2262893456.action.# => 1
rule.2262893456.action.0.block.# => 1
rule.2262893456.action.0.block.0.custom_response.# => 0

InstanceState.RawState cty.ObjectVal(map[string]cty.Value{"id":cty.StringVal("newid"), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.ListValEmpty(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String}))})})})})})})})
InstanceState.RawConfig cty.ObjectVal(map[string]cty.Value{"id":cty.NullVal(cty.String), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.ListValEmpty(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String}))})})})})})})})
InstanceState.RawPlan cty.ObjectVal(map[string]cty.Value{"id":cty.StringVal("newid"), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.ListValEmpty(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String}))})})})})})})})
Resource Config ###
Raw: map[id:newid rule:[map[action:[map[block:[map[]]]]]]]
Config: map[id:newid rule:[map[action:[map[block:[map[]]]]]]]
DONE       ################################################################################


Pulumi SimpleDiff ################################################################################
InstanceState.Attributes ###
id => newid
rule.# => 1
rule.2262893456.action.# => 1
rule.2262893456.action.0.block.# => 1
rule.2262893456.action.0.block.0.custom_response.# => 0
InstanceState.RawState cty.ObjectVal(map[string]cty.Value{"id":cty.StringVal("newid"), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.ListValEmpty(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String}))})})})})})})})
InstanceState.RawConfig cty.ObjectVal(map[string]cty.Value{"id":cty.NullVal(cty.String), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.NullVal(cty.List(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String})))})})})})})})})
InstanceState.RawPlan cty.ObjectVal(map[string]cty.Value{"id":cty.StringVal("newid"), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.NullVal(cty.List(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String})))})})})})})})})
Resource Config ###
Raw: map[id:newid rule:[map[action:[map[block:[map[]]]]]]]
Config: map[id:newid rule:[map[action:[map[block:[map[]]]]]]]
DONE       ################################################################################

the only change is around rawconfig.. I can fix that by a small change, however this does not fix the test.

t0yv0 · 2024-05-02T23:29:12Z

The change is this:

func recoverAndCoerceCtyValue(resource *schema.Resource, value any) (cty.Value, error) {
	v, err := recoverAndCoerceCtyValueWithSchema(resource.CoreConfigSchema(), value)
	if err != nil {
		return cty.NilVal, err
	}
	v = schema.NormalizeObjectFromLegacySDK(v, resource.CoreConfigSchema()) // ADDED
	return v, nil
}

t0yv0 · 2024-05-02T23:30:21Z

This is super fascinating, so under both Pulumi and TF, even if we pass the exact same data into res.SimpleDiff and get exact same InstanceDiff back we get different results. The InstanceDiff object printout is the same in both cases:

Tf

Finally ################################################################################
rule.1277166719.action.# => &{ 1 false false <nil> false false 0}
rule.1277166719.action.0.block.# => &{ 1 false false <nil> false false 0}
rule.1277166719.action.0.block.0.custom_response.# => &{ 0 false false <nil> false false 0}
rule.2262893456.action.# => &{1 0 false false <nil> false false 0}
rule.2262893456.action.0.block.# => &{1 0 false false <nil> false false 0}
rule.2262893456.action.0.block.0.custom_response.# => &{0 0 false false <nil> false false 0}
DONE1       ################################################################################



PULUMI

Finally ################################################################################
rule.1277166719.action.# => &{ 1 false false <nil> false false 0}
rule.1277166719.action.0.block.# => &{ 1 false false <nil> false false 0}
rule.1277166719.action.0.block.0.custom_response.# => &{ 0 false false <nil> false false 0}
rule.2262893456.action.# => &{1 0 false false <nil> false false 0}
rule.2262893456.action.0.block.# => &{1 0 false false <nil> false false 0}
rule.2262893456.action.0.block.0.custom_response.# => &{0 0 false false <nil> false false 0}
DONE1       ################################################################################

t0yv0 · 2024-05-02T23:36:35Z

I need to take a break from this, but my last hypothesis is that TF just ignores the IntanceDiff .. the only reason we use it are legacy reasons. TF might be computing cty.Value PlannedState and then diffing existing state vs PlannedState to show detailed diff. This is kind of implied in the gRPC protocol of PlanResourceChange that does not expose InstanceDiff on the wire. I had to patch the terraform-plugin-sdk to get a hold of this object which clearly is playing against the rules. The reasoning was that our detailedDiff computation was very difficult to edit for historical reasons, and it is written against the InstanceDiff type.

I think if this hunch is right here, we need to finally rewrite detailed diff. It can be done by comparing two cty.Value instances - one from prior state and one from PlannedState and translating the diff back to Pulumi domain (this assumes there are no Pulumi properties without matching TF properties). This can clean out a lot of things.

t0yv0 · 2024-05-02T23:37:01Z

#1895

t0yv0 · 2024-05-02T23:37:59Z

Or we can indeed diff Pulumi representations as outlined in #1895, with the caveat that we likely will need to special-case set diffing and the set typing information is only available in TF world.

t0yv0 · 2024-05-06T15:36:07Z

Interesting. I can confirm that in this case in PlanResourceChange we get:

plannedStateVal.Equals(priorStateVal)

And yet we have non-zero DiffAttributes

resp.InstanceDiff.Attributes

Code: https://github.com/pulumi/terraform-plugin-sdk/blob/master/helper/schema/grpc_provider.go#L742

Consequentially this heuristic is the root cause of the wrong decision here, as plannedStateVal.Equals(priorStateVal) should be more authoritative about this being a "no change" diff:
https://github.com/pulumi/pulumi-terraform-bridge/blob/master/pkg/tfbridge/provider.go#L963

And this is the root-cause fix:
#1895

With AWS 3880 there is some evidence (derivation in #1917) that sometimes TF has entries in the InstanceDiff.Attributes while still planning to take the resource to the end-state that is identical to the original state. IN these cases, TF does not display a diff but Pulumi does. The root cause here remains unfixed (#1895) - Pulumi bridge is editing terraform-pulgin-sdk to expose the InstanceDiff structure to connect it to the makeDetailedDiff machinery. Pulumi should, like TF, stick to the gRPC protocol and rely only on the PlannedState value. We can incrementally approach the desired behavior with this change though which detects PlannedState=PriorState case and suppresses any diffs in this case. Fixes: - pulumi/pulumi-aws#3880 - pulumi/pulumi-aws#3306 - pulumi/pulumi-aws#3190 - pulumi/pulumi-aws#3454 --------- Co-authored-by: Venelin <[email protected]>

t0yv0 · 2024-09-27T20:05:43Z

The AWS issue got fixed! Closing this.

Reproduce aws-3880 in a test

3a20298

t0yv0 requested a review from VenelinMartinov April 30, 2024 19:39

t0yv0 commented Apr 30, 2024

View reviewed changes

WIP

8706961

t0yv0 added 3 commits May 2, 2024 14:36

Minimize the repro

a255fa9

Minimal diff test

668e2eb

Mini repro fast to debug

40952e3

t0yv0 commented May 2, 2024

View reviewed changes

VenelinMartinov reviewed May 2, 2024

View reviewed changes

This was referenced May 6, 2024

wafv2.WebAcl: tweak Set identity to suppress resource cycling pulumi/pulumi-aws#3897

Closed

Permanent diff on aws.wafv2.RuleGroup pulumi/pulumi-aws#3190

Closed

Received diff update for created waf resources pulumi/pulumi-aws#3306

Closed

t0yv0 added 2 commits May 6, 2024 11:59

Introduce HasNoChanges instead of Attributes

06d4038

Fix by considering PlannedState=PriorState

38fa425

t0yv0 mentioned this pull request May 6, 2024

Fix overeager diffs #1927

Merged

t0yv0 closed this Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce aws-3880 in a test #1917

Reproduce aws-3880 in a test #1917

t0yv0 commented Apr 30, 2024 •

edited

Loading

t0yv0 Apr 30, 2024

t0yv0 Apr 30, 2024

t0yv0 Apr 30, 2024

t0yv0 Apr 30, 2024

t0yv0 commented Apr 30, 2024

t0yv0 commented Apr 30, 2024 •

edited

Loading

t0yv0 commented Apr 30, 2024

t0yv0 commented Apr 30, 2024

t0yv0 commented Apr 30, 2024

t0yv0 commented Apr 30, 2024

VenelinMartinov commented May 1, 2024

t0yv0 May 2, 2024

t0yv0 commented May 2, 2024 •

edited

Loading

t0yv0 commented May 2, 2024

VenelinMartinov May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 6, 2024

t0yv0 commented Sep 27, 2024

Reproduce aws-3880 in a test #1917

Reproduce aws-3880 in a test #1917

Conversation

t0yv0 commented Apr 30, 2024 • edited Loading

t0yv0 Apr 30, 2024

Choose a reason for hiding this comment

t0yv0 Apr 30, 2024

Choose a reason for hiding this comment

t0yv0 Apr 30, 2024

Choose a reason for hiding this comment

t0yv0 Apr 30, 2024

Choose a reason for hiding this comment

t0yv0 commented Apr 30, 2024

t0yv0 commented Apr 30, 2024 • edited Loading

t0yv0 commented Apr 30, 2024

t0yv0 commented Apr 30, 2024

t0yv0 commented Apr 30, 2024

t0yv0 commented Apr 30, 2024

VenelinMartinov commented May 1, 2024

t0yv0 May 2, 2024

Choose a reason for hiding this comment

t0yv0 commented May 2, 2024 • edited Loading

t0yv0 commented May 2, 2024

VenelinMartinov May 2, 2024

Choose a reason for hiding this comment

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 2, 2024

t0yv0 commented May 6, 2024

t0yv0 commented Sep 27, 2024

t0yv0 commented Apr 30, 2024 •

edited

Loading

t0yv0 commented Apr 30, 2024 •

edited

Loading

t0yv0 commented May 2, 2024 •

edited

Loading