Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider/aws: Update Security Group Rules to Version 2 #3019

Merged
merged 4 commits into from
Oct 12, 2015

Conversation

catsby
Copy link
Contributor

@catsby catsby commented Aug 18, 2015

This patch aims to resolve a lingering issue with Security Group Rules,
where different resources generate rules that are consolidated on Amazon's
side, as a result resulting in loss of local resources in the state file
and duplication errors on AWS's side.

Example:

resource "aws_security_group_rule" "first" {
    type        = "ingress"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["10.0.2.0/24", "10.0.3.0/24"]

   security_group_id = "${aws_security_group.nat.id}"
}

After applying this rule, our security group nat looks like this:

{
  Description: "Managed by Terraform",
  GroupId: "sg-87eb46e3",
  GroupName: "secg-nat",
  IpPermissions: [{
      FromPort: 80,
      IpProtocol: "tcp",
      IpRanges: [{
          CidrIp: "10.0.2.0/24"
        },{
          CidrIp: "10.0.3.0/24"
        }],
      ToPort: 80
    }],
  OwnerId: "470663696735",
  VpcId: "vpc-0806716d"
}

This rule, with 2 cidr_block entires, is a single IPPermission rule according to the API.

At present, the Security Group Rule resource hashes it's values (ports, cidrs, security groups)
to generate a key. This breaks down however, when another rule is added with a new cidr_block.

Now, adding a second rule that also uses a cidr_block like so, will reveal the bug:

resource "aws_security_group_rule" "second" {
    type        = "ingress"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["10.0.5.0/24"]

   security_group_id = "${aws_security_group.nat.id}"
}

The IPPermissions now look like this:

Full Security Group: {
  Description: "Managed by Terraform",
  GroupId: "sg-87eb46e3",
  GroupName: "secg-nat",
  IpPermissions: [{
      FromPort: 80,
      IpProtocol: "tcp",
      IpRanges: [{
          CidrIp: "10.0.2.0/24"
        },{
          CidrIp: "10.0.3.0/24"
        },{
          CidrIp: "10.0.5.0/24"
        }],
      ToPort: 80
    }],
  OwnerId: "470663696735",
  VpcId: "vpc-0806716d"
}

The API has consolidated this new cidr_block into an existing rule, based on the port and
protocol it applies to. Because of the way Terraform hashes the rules locally, it can no longer find
either resource, as it hashes the entire IPPermission looking for a match.

This pull request resolves this by iterating through IpPermissions and looking for matching criteria,
ensuring that each cidr_block or security_group_id is found in a set.

This patch also fixes another issue, where similar rules applied to different security groups caused the
same kind of conflict, because we were not hashing the security group id itself into the id of the rule.
We now consider the id in the hash for the rule to avoid this.

Refs:

@danabr
Copy link

danabr commented Aug 26, 2015

I think this is the very same issue we've been running into recently. I'll try your branch and see if it resolves our issues.

Update: It seems to have solved the issue when testing on a fresh stack. It did not help recovering from a situation where the state file had already been "corrupted", though (which it might not have intended to do anyway).

@bronislav
Copy link

This patch solves my issue as well. Waiting to see it in the released version.

continue
}

remaining = len(p.UserIdGroupPairs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a comment here, mentioning that it's for cases of EC2-Classic/default-VPC.

@radeksimko
Copy link
Member

If you're already changing the ID/hash, I'm thinking it would make sense to make one more tiny change to make things a little bit more obvious, when looking into the state:

https://github.com/hashicorp/terraform/pull/3019/files#diff-7af0362dbadfdc3cfd6233049cbe05b2R376

return fmt.Sprintf("sg-%d", hashcode.String(buf.String()))
return fmt.Sprintf("sg-rule-%d", hashcode.String(buf.String()))

one would almost think that it's an actual security group ID, while it's a hash.

@@ -256,6 +256,92 @@ func TestAccAWSSecurityGroupRule_SelfReference(t *testing.T) {
})
}

// testing partial match implementation
func TestAccAWSSecurityGroupRule_PartialMatching_Basic(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the convention (at least based on quick grep) is _basic (small caps).

@sairez
Copy link

sairez commented Sep 29, 2015

This definitely seems to fix the problem we are having with rules not being written to state.

To fix rules in an existing infrastructure we had to delete/clean up the rules on the amazon side, and then re-apply with this terraform branch.

I'd like to see this merged as soon as possible!

@radeksimko
Copy link
Member

@sairez

To fix rules in an existing infrastructure we haa to delete/clean up the rules on the amazon side, and then re-apply with this terraform branch.

The "cleanest" solution is to taint these security groups. Also I was trying to find out how to make the transition smooth, but I don't think there's any easy way, since there was no state previously -> we can't find out which rule comes from where and we cannot guess it either.

@radeksimko radeksimko added the waiting-response An issue/pull request is waiting for a response from the community label Oct 3, 2015
Computed: true,
Type: schema.TypeString,
Optional: true,
Computed: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this whitespace diff separately? Sorry to nit - I'm a big git blame spelunker 😀

@phinze
Copy link
Contributor

phinze commented Oct 12, 2015

LGTM!

catsby added a commit that referenced this pull request Oct 12, 2015
provider/aws: Update Security Group Rules to Version 2
@catsby catsby merged commit ad42313 into master Oct 12, 2015
@catsby
Copy link
Contributor Author

catsby commented Oct 12, 2015

At long last, we have merged this PR. I sincerely apologize to everyone who suffered through this and waited on the merge. Please give it a spin and let us know if there's anything else hanging here.

Thanks!

@catsby catsby mentioned this pull request Oct 12, 2015
@scalp42
Copy link
Contributor

scalp42 commented Oct 17, 2015

@catsby we're still seeing the same issues within the SGs where rules order is changing, using latest released version.

We tried to taint as advised by @radeksimko but unfortunately while doing an apply we can see it live within AWS console, the rules order is changing non stop.

Initially:

Couple seconds later:

It repeats multiple times.

Diff in state file:

-                            "ingress.2325821899.cidr_blocks.1": "10.42.42.93/32",
-                            "ingress.2325821899.cidr_blocks.2": "10.52.0.0/16",
+                            "ingress.2325821899.cidr_blocks.1": "10.52.0.0/16",
+                            "ingress.2325821899.cidr_blocks.2": "10.42.42.93/32",

During that shuffling time across all our SGs, we also lose traffic due to SG changes (some rules I'm assuming are being removed and re-added couple secs later).

@radeksimko
Copy link
Member

@scalp42 Just to double check - have you tried tainting security group rules or security groups themselves? You should be tainting security groups.

@radeksimko radeksimko deleted the b-aws-sg-rules-v2 branch October 17, 2015 22:34
@scalp42
Copy link
Contributor

scalp42 commented Oct 18, 2015

@radeksimko I tried to taint the SGs (not the rules).

Is this PR supposed to help with #2843 ?

@catsby
Copy link
Contributor Author

catsby commented Oct 19, 2015

This PR was designed to patch an issue where Rules themselves were getting "lost", causing Terraform to want to re-create them, and thus ending up with a duplication error from AWS (since the rule did actually exist).

@scalp42 I think #2843 may be something else, I'll have to dive into that

@BogdanSorlea
Copy link

BogdanSorlea commented Aug 15, 2016

Is this actually fixed? We run v0.6.15 and v0.6.16 and still see this problem. What voodoo should we do to reliably avoid getting these kind of errors going forward?
Also, I can't find in the changelog any of the issues referenced on this PR, wtf?
@catsby

@ghost
Copy link

ghost commented Apr 23, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug provider/aws waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants