Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce aws-3880 in a test #1917

Closed
wants to merge 7 commits into from
Closed

Reproduce aws-3880 in a test #1917

wants to merge 7 commits into from

Conversation

t0yv0
Copy link
Member

@t0yv0 t0yv0 commented Apr 30, 2024

Toward a min-repro for pulumi/pulumi-aws#3880

WebACL Rule attribute has a complicated schema and it looks like something is getting distorted under the Pulumi translation with no user changes, so that a different set element identity is computed under Pulumi during the first and the second pulumi up.

This resource is already under PlanResourceChange flag.

Narrowing it down from here.

@@ -102,17 +102,23 @@ func (ta *typeAdapter) NewValue(value any) tftypes.Value {
switch v := value.(type) {
case map[string]any:
values := map[string]tftypes.Value{}
for k, el := range v {
values[k] = fromType(aT[k]).NewValue(el)
for key, expectedType := range aT {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that tftypes.Value representation insists (via panics) that every attribute has an entry, even if it's a nil entry. It also insisted on no optional attributes. This is now the case in this adapter, it frees the test writer from writing explicit nulls for attributes that do not matter.

}
return n
},
// Set: func(v interface{}) int {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was really here for debugging only. The original in AWS does not specify a custom Set.

for _, v := range value.AsValueSet().Values() {
newBlock := body.AppendNewBlock(key, nil)
writeBlock(newBlock.Body(), elem.Schema, v.AsValueMap())
if !value.IsNull() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tolerate missing values encoded by nulls - as required by tftypes.Value (and translated to cty.Value).

@@ -58,8 +58,8 @@ func runDiffCheck(t T, tc diffTestCase) {
tfwd := t.TempDir()

tfd := newTfDriver(t, tfwd, providerShortName, rtype, tc.Resource)
_ = tfd.writePlanApply(t, tc.Resource.Schema, rtype, "example", tc.Config1)
tfDiffPlan := tfd.writePlanApply(t, tc.Resource.Schema, rtype, "example", tc.Config2)
_ = tfd.writePlanApply(t, tc.Resource.SchemaMap(), rtype, "example", tc.Config1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource may specify Schema or SchemaFunc, and SchemaMap() normalizes to access either, this is the way.

@t0yv0
Copy link
Member Author

t0yv0 commented Apr 30, 2024

Alright, there is something fairly unexpected going on with what the hash function receives under TF proper for this example during a normal TF lifecycle.

I have instrumented the set function:

	// Here i may receive maps or slices over base types and *schema.Set which is not friendly to diffing.
	resource.Schema["rule"].Set = func(i interface{}) int {

These are the invocations under TF itself.

    120:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=*Set(map[string]interface {}(nil)))==> 4129856294
    124:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=*Set(map[string]interface {}(nil)))==> 4129856294
    128:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=*Set(map[string]interface {}(nil)))==> 643481770
     43:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=*Set(map[string]interface {}(nil)))==> 643481770
     49:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=*Set(map[string]interface {}(nil)))==> 643481770
     53:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=*Set(map[string]interface {}(nil)))==> 643481770
     57:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=*Set(map[string]interface {}(nil)))==> 4129856294
     61:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255
     65:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255
     69:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255
     73:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255
     77:#### Computing hash set for rule <<expected2>> (action=[]interface {}{map[string]interface {}{"custom_response":[]interface {}{}}}, ruleLabel=<nil>)==> 3409981255

@t0yv0
Copy link
Member Author

t0yv0 commented Apr 30, 2024

Under Pulumi there is a new and exciting combination I have not seen under TF:

    254:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=<nil>)==> 835885598
    262:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=<nil>)==> 835885598
    270:#### Computing hash set for rule <<expected>> (action=[]interface {}{interface {}(nil)}, ruleLabel=<nil>)==> 835885598

@t0yv0
Copy link
Member Author

t0yv0 commented Apr 30, 2024

I think this new log line comes from
https://github.com/pulumi/pulumi-terraform-bridge/blob/t0yv0%2Faws-3880/pkg/tfbridge/diff.go#L105

     Error Trace:    /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tests/cross-tests/cross_test.go:508
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfshim/sdk-v2/schema.go:180
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfbridge/diff.go:120
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfbridge/diff.go:270
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfbridge/diff.go:365
                                                        /Users/t0yv0/code/pulumi-terraform-bridge/pkg/tfbridge/provider.go:953
                                                        /Users/t0yv0/go/pkg/mod/github.com/pulumi/pulumi/sdk/[email protected]/proto/go/provider_grpc.pb.go:568
                                                        /Users/t0yv0/go/pkg/mod/google.golang.org/[email protected]/server.go:1386
                                                        /Users/t0yv0/go/pkg/mod/google.golang.org/[email protected]/server.go:1797
                                                        /Users/t0yv0/go/pkg/mod/google.golang.org/[email protected]/server.go:1027
                                                        /nix/store/brv7d6mlrclkzywf1vaf35wqhq4c0c82-go-1.21.5/share/go/src/runtime/asm_amd64.s:1650

This sort of makes sense, the makeDetailedDiff/visitPropertyValue call chain assumes that it can compute TF representations directly off olds, news resource.PropertyMap by looking at the re-projection of these original Pulumi values into TF domain through a conversionContext. However this is not a valid assumption to make when TF provider has made changes to the system under PlanResourceChange.

@t0yv0
Copy link
Member Author

t0yv0 commented Apr 30, 2024

Let me play a bit more to be absolutely sure that fixing this would resolve the bug.

@t0yv0
Copy link
Member Author

t0yv0 commented Apr 30, 2024

Turns out to be a red herring - not the root cause; disabling detailed diff is not affecting the test result here.

rule.643481770.action.0.captcha.# { 0 false false <nil> false false 0}
rule.4129856294.action.0.allow.# {0 0 false false <nil> false false 0}

So the root cause is closer to the other avenue - Pulumi is confusing 643481770 and 4129856294 hashes

So this is about whether custom_response is populated or not under rule[0].action.block.

"custom_response":[]interface {}{}}

@t0yv0
Copy link
Member Author

t0yv0 commented Apr 30, 2024

To be continued here - curious at which earliest point Pulumi starts to diverge from TF on the custom_response.

In the meanwhile we can try to teach the AWS provider to disregard the custom_response difference for the purposes of hashing. This might workaround the problem for this resource in particular.

@VenelinMartinov
Copy link
Contributor

This kind of looks like #1915

{
"action": {
"block": {
"customResponse": null
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the smallest repro so far. Very interesting. You can omit "customResponse" here and it still thinks this is a change.

@t0yv0
Copy link
Member Author

t0yv0 commented May 2, 2024

Spelunking further. The custom_resource null entry got injected here by provider2.InstanceState

InstanceState received map[id:newid rule:[map[action:[map[block:[map[]]]]]]]

cty.ObjectVal(map[string]cty.Value{
  "id":cty.StringVal("newid"),
  "rule":cty.SetVal([]cty.Value{
      cty.ObjectVal(map[string]cty.Value{
          "action":cty.ListVal([]cty.Value{
              cty.ObjectVal(map[string]cty.Value{
                  "block":cty.ListVal([]cty.Value{
                      cty.ObjectVal(map[string]cty.Value{
                          "custom_response":cty.NullVal(...)
                      })
                  })
...              


@t0yv0
Copy link
Member Author

t0yv0 commented May 2, 2024

https://github.com/pulumi/pulumi-terraform-bridge/blob/t0yv0%2Faws-3880/pkg/tfshim/sdk-v2/upgrade_state.go#L71

Hmm, so I'm pretty sure this the place we hit. NormalizeObjectFromLegacySDK is the place where we go from custom_response=null to custom_response=[]. Subsequently there is this difference between state and plan ([] vs null). And this is what triggers incorrect hashing and cycling. This was the Pulumi side. Now let me debug the TF side and see what is received into SimpleDiff under TF.

})
}

func TestAws3880Minimal(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a few levels of nesting - if required then we need to update the property based tests to go deeper - this could not have been generated there.

@t0yv0
Copy link
Member Author

t0yv0 commented May 2, 2024

This gets more interesting. I've instrumented resource.SimpleDiff. This is the method at which resource planning bottoms out in both scenarios.

TF SimpleDIff ##############################################################################
InstanceState.Attributes ###

id => newid
rule.# => 1
rule.2262893456.action.# => 1
rule.2262893456.action.0.block.# => 1
rule.2262893456.action.0.block.0.custom_response.# => 0

InstanceState.RawState cty.ObjectVal(map[string]cty.Value{"id":cty.StringVal("newid"), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.ListValEmpty(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String}))})})})})})})})
InstanceState.RawConfig cty.ObjectVal(map[string]cty.Value{"id":cty.NullVal(cty.String), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.ListValEmpty(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String}))})})})})})})})
InstanceState.RawPlan cty.ObjectVal(map[string]cty.Value{"id":cty.StringVal("newid"), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.ListValEmpty(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String}))})})})})})})})
Resource Config ###
Raw: map[id:newid rule:[map[action:[map[block:[map[]]]]]]]
Config: map[id:newid rule:[map[action:[map[block:[map[]]]]]]]
DONE       ################################################################################


Pulumi SimpleDiff ################################################################################
InstanceState.Attributes ###
id => newid
rule.# => 1
rule.2262893456.action.# => 1
rule.2262893456.action.0.block.# => 1
rule.2262893456.action.0.block.0.custom_response.# => 0
InstanceState.RawState cty.ObjectVal(map[string]cty.Value{"id":cty.StringVal("newid"), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.ListValEmpty(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String}))})})})})})})})
InstanceState.RawConfig cty.ObjectVal(map[string]cty.Value{"id":cty.NullVal(cty.String), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.NullVal(cty.List(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String})))})})})})})})})
InstanceState.RawPlan cty.ObjectVal(map[string]cty.Value{"id":cty.StringVal("newid"), "rule":cty.SetVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"action":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"block":cty.ListVal([]cty.Value{cty.ObjectVal(map[string]cty.Value{"custom_response":cty.NullVal(cty.List(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String})))})})})})})})})
Resource Config ###
Raw: map[id:newid rule:[map[action:[map[block:[map[]]]]]]]
Config: map[id:newid rule:[map[action:[map[block:[map[]]]]]]]
DONE       ################################################################################

the only change is around rawconfig.. I can fix that by a small change, however this does not fix the test.

@t0yv0
Copy link
Member Author

t0yv0 commented May 2, 2024

The change is this:

func recoverAndCoerceCtyValue(resource *schema.Resource, value any) (cty.Value, error) {
	v, err := recoverAndCoerceCtyValueWithSchema(resource.CoreConfigSchema(), value)
	if err != nil {
		return cty.NilVal, err
	}
	v = schema.NormalizeObjectFromLegacySDK(v, resource.CoreConfigSchema()) // ADDED
	return v, nil
}

@t0yv0
Copy link
Member Author

t0yv0 commented May 2, 2024

This is super fascinating, so under both Pulumi and TF, even if we pass the exact same data into res.SimpleDiff and get exact same InstanceDiff back we get different results. The InstanceDiff object printout is the same in both cases:

Tf

Finally ################################################################################
rule.1277166719.action.# => &{ 1 false false <nil> false false 0}
rule.1277166719.action.0.block.# => &{ 1 false false <nil> false false 0}
rule.1277166719.action.0.block.0.custom_response.# => &{ 0 false false <nil> false false 0}
rule.2262893456.action.# => &{1 0 false false <nil> false false 0}
rule.2262893456.action.0.block.# => &{1 0 false false <nil> false false 0}
rule.2262893456.action.0.block.0.custom_response.# => &{0 0 false false <nil> false false 0}
DONE1       ################################################################################



PULUMI

Finally ################################################################################
rule.1277166719.action.# => &{ 1 false false <nil> false false 0}
rule.1277166719.action.0.block.# => &{ 1 false false <nil> false false 0}
rule.1277166719.action.0.block.0.custom_response.# => &{ 0 false false <nil> false false 0}
rule.2262893456.action.# => &{1 0 false false <nil> false false 0}
rule.2262893456.action.0.block.# => &{1 0 false false <nil> false false 0}
rule.2262893456.action.0.block.0.custom_response.# => &{0 0 false false <nil> false false 0}
DONE1       ################################################################################

@t0yv0
Copy link
Member Author

t0yv0 commented May 2, 2024

I need to take a break from this, but my last hypothesis is that TF just ignores the IntanceDiff .. the only reason we use it are legacy reasons. TF might be computing cty.Value PlannedState and then diffing existing state vs PlannedState to show detailed diff. This is kind of implied in the gRPC protocol of PlanResourceChange that does not expose InstanceDiff on the wire. I had to patch the terraform-plugin-sdk to get a hold of this object which clearly is playing against the rules. The reasoning was that our detailedDiff computation was very difficult to edit for historical reasons, and it is written against the InstanceDiff type.

I think if this hunch is right here, we need to finally rewrite detailed diff. It can be done by comparing two cty.Value instances - one from prior state and one from PlannedState and translating the diff back to Pulumi domain (this assumes there are no Pulumi properties without matching TF properties). This can clean out a lot of things.

@t0yv0
Copy link
Member Author

t0yv0 commented May 2, 2024

#1895

@t0yv0
Copy link
Member Author

t0yv0 commented May 2, 2024

Or we can indeed diff Pulumi representations as outlined in #1895, with the caveat that we likely will need to special-case set diffing and the set typing information is only available in TF world.

@t0yv0
Copy link
Member Author

t0yv0 commented May 6, 2024

Interesting. I can confirm that in this case in PlanResourceChange we get:

plannedStateVal.Equals(priorStateVal)

And yet we have non-zero DiffAttributes

resp.InstanceDiff.Attributes

Code: https://github.com/pulumi/terraform-plugin-sdk/blob/master/helper/schema/grpc_provider.go#L742

Consequentially this heuristic is the root cause of the wrong decision here, as plannedStateVal.Equals(priorStateVal) should be more authoritative about this being a "no change" diff:
https://github.com/pulumi/pulumi-terraform-bridge/blob/master/pkg/tfbridge/provider.go#L963

And this is the root-cause fix:
#1895

@t0yv0 t0yv0 mentioned this pull request May 6, 2024
t0yv0 added a commit that referenced this pull request May 6, 2024
With AWS 3880 there is some evidence (derivation in
#1917) that
sometimes TF has entries in the InstanceDiff.Attributes while still
planning to take the resource to the end-state that is identical to the
original state. IN these cases, TF does not display a diff but Pulumi
does.

The root cause here remains unfixed
(#1895) - Pulumi
bridge is editing terraform-pulgin-sdk to expose the InstanceDiff
structure to connect it to the makeDetailedDiff machinery. Pulumi
should, like TF, stick to the gRPC protocol and rely only on the
PlannedState value.

We can incrementally approach the desired behavior with this change
though which detects PlannedState=PriorState case and suppresses any
diffs in this case.

Fixes:

- pulumi/pulumi-aws#3880
- pulumi/pulumi-aws#3306
- pulumi/pulumi-aws#3190
- pulumi/pulumi-aws#3454

---------

Co-authored-by: Venelin <[email protected]>
@t0yv0
Copy link
Member Author

t0yv0 commented Sep 27, 2024

The AWS issue got fixed! Closing this.

@t0yv0 t0yv0 closed this Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants