Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container image builds now randomly fail when triggered through CloudFormation (due to race condition in Java SDK?) #4

Closed
OrEisenberg opened this issue Sep 28, 2021 · 9 comments

Comments

@OrEisenberg
Copy link

OrEisenberg commented Sep 28, 2021

TL;DR: In the last week or so, I have begun encountering a non-deterministic bug when trying to trigger container image builds using CloudFormation. As a result, it seems that it has sadly become impossible to reliably build container images using CloudFormation's EC2 ImageBuilder functionality! I have come to this conclusion by carrying out the below described procedure.

You can find below two CloudFormation scripts which were generated by CDK. I apologize for their lengths, but I am far from fluent in CloudFormation (you may also find beneath them the two python CDK scripts which were respectively used to generate them). CloudFormation script number one builds a container recipe and an infrastructure configuration (along with all the resources upon which these are dependent including an image component which does nothing, a VPC, a subnet, an ECR repository, an IAM role, an instance profile, and an S3 bucket for logging). If, after this stack successfully launches, one manually triggers any number of container image builds from the AWS CLI using the command

aws imagebuilder create-image \
--container-recipe-arn <container-recipe-from-launched-stack> \
--infrastructure-configuration-arn <infrastructure-configuration-from-launched-stack>

one finds that they always build successfully.

If, however, one then upload the second CloudFormation script as a change set to the original Stack, the resulting difference simply changes some CDK metadata and launches 10 identical copies of the image with the same container recipe and infrastructure configuration as those just provisioned from the command line.

The catch is that among those 10 image builds just triggered by this change set, some number will fail (usually between two and four in my experience). Checking the logs which are generated in the S3 logging bucket built in the stack yields one of the following two related error messages for those failed builds:

failed to download the EC2 Image Builder Component '<component arn>'. Error - operation error 
imagebuilder: GetComponent, failed to sign request: failed to retrieve credentials: failed to decode 
<imagebuilder role> EC2 IMDS role credentials, context canceled

or

failed to upload file <local path to component file> to <remote destination> with error 'operation error 
S3: PutObject, https response error StatusCode: 400, RequestID: <request id>, HostID: <host id>, 
api error AuthorizationHeaderMalformed: The authorization header is malformed; a non-empty Access 
Key (AKID) must be provided in the credential.

Searching for the former error message on the web, one finds this relevant thread from an issue in the Go SDK. My suspicion is that this codebase is relying on a Java SDK suffering from a similar race condition as the one described in this thread, but I don't know Java and therefore can neither confirm nor deny this suspicion.

I hope that this has been a sufficiently detailed accounting of this bug as for you to be able to reproduce it. If, however, there's any more information which I can provide you which might help you reproduce or investigate this bug, please do not hesitate to ask.

CloudFormation Script 1:

Resources:
  TestVpcE77CE678:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      InstanceTenancy: default
      Tags:
        - Key: Name
          Value: TestStack/TestVpc
    Metadata:
      aws:cdk:path: TestStack/TestVpc/Resource
  TestVpcPublicSubnet1SubnetA7DB1EDF:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.0.0/18
      VpcId:
        Ref: TestVpcE77CE678
      AvailabilityZone:
        Fn::Select:
          - 0
          - Fn::GetAZs: ""
      MapPublicIpOnLaunch: true
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Public
        - Key: aws-cdk:subnet-type
          Value: Public
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/Subnet
  TestVpcPublicSubnet1RouteTable4CBFF871:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/RouteTable
  TestVpcPublicSubnet1RouteTableAssociation7D1DECD9:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: TestVpcPublicSubnet1RouteTable4CBFF871
      SubnetId:
        Ref: TestVpcPublicSubnet1SubnetA7DB1EDF
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/RouteTableAssociation
  TestVpcPublicSubnet1DefaultRoute6C0F0315:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: TestVpcPublicSubnet1RouteTable4CBFF871
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId:
        Ref: TestVpcIGW9DD53F70
    DependsOn:
      - TestVpcVPCGWF1827B84
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/DefaultRoute
  TestVpcPublicSubnet1EIP4884338C:
    Type: AWS::EC2::EIP
    Properties:
      Domain: vpc
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/EIP
  TestVpcPublicSubnet1NATGatewayA323E3EC:
    Type: AWS::EC2::NatGateway
    Properties:
      SubnetId:
        Ref: TestVpcPublicSubnet1SubnetA7DB1EDF
      AllocationId:
        Fn::GetAtt:
          - TestVpcPublicSubnet1EIP4884338C
          - AllocationId
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/NATGateway
  TestVpcPublicSubnet2Subnet80A14523:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.64.0/18
      VpcId:
        Ref: TestVpcE77CE678
      AvailabilityZone:
        Fn::Select:
          - 1
          - Fn::GetAZs: ""
      MapPublicIpOnLaunch: true
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Public
        - Key: aws-cdk:subnet-type
          Value: Public
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/Subnet
  TestVpcPublicSubnet2RouteTable75B88314:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/RouteTable
  TestVpcPublicSubnet2RouteTableAssociationB386A819:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: TestVpcPublicSubnet2RouteTable75B88314
      SubnetId:
        Ref: TestVpcPublicSubnet2Subnet80A14523
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/RouteTableAssociation
  TestVpcPublicSubnet2DefaultRoute054DAE0A:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: TestVpcPublicSubnet2RouteTable75B88314
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId:
        Ref: TestVpcIGW9DD53F70
    DependsOn:
      - TestVpcVPCGWF1827B84
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/DefaultRoute
  TestVpcPublicSubnet2EIP83F7944C:
    Type: AWS::EC2::EIP
    Properties:
      Domain: vpc
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/EIP
  TestVpcPublicSubnet2NATGatewayA9858C31:
    Type: AWS::EC2::NatGateway
    Properties:
      SubnetId:
        Ref: TestVpcPublicSubnet2Subnet80A14523
      AllocationId:
        Fn::GetAtt:
          - TestVpcPublicSubnet2EIP83F7944C
          - AllocationId
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/NATGateway
  TestVpcPrivateSubnet1SubnetCC65D771:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.128.0/18
      VpcId:
        Ref: TestVpcE77CE678
      AvailabilityZone:
        Fn::Select:
          - 0
          - Fn::GetAZs: ""
      MapPublicIpOnLaunch: false
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Private
        - Key: aws-cdk:subnet-type
          Value: Private
        - Key: Name
          Value: TestStack/TestVpc/PrivateSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet1/Subnet
  TestVpcPrivateSubnet1RouteTable469B0105:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PrivateSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet1/RouteTable
  TestVpcPrivateSubnet1RouteTableAssociationFFD4DFF7:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: TestVpcPrivateSubnet1RouteTable469B0105
      SubnetId:
        Ref: TestVpcPrivateSubnet1SubnetCC65D771
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet1/RouteTableAssociation
  TestVpcPrivateSubnet1DefaultRoute32E7B814:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: TestVpcPrivateSubnet1RouteTable469B0105
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId:
        Ref: TestVpcPublicSubnet1NATGatewayA323E3EC
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet1/DefaultRoute
  TestVpcPrivateSubnet2SubnetDE0C64A2:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.192.0/18
      VpcId:
        Ref: TestVpcE77CE678
      AvailabilityZone:
        Fn::Select:
          - 1
          - Fn::GetAZs: ""
      MapPublicIpOnLaunch: false
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Private
        - Key: aws-cdk:subnet-type
          Value: Private
        - Key: Name
          Value: TestStack/TestVpc/PrivateSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet2/Subnet
  TestVpcPrivateSubnet2RouteTableCEF29F7C:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PrivateSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet2/RouteTable
  TestVpcPrivateSubnet2RouteTableAssociation18250AB4:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: TestVpcPrivateSubnet2RouteTableCEF29F7C
      SubnetId:
        Ref: TestVpcPrivateSubnet2SubnetDE0C64A2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet2/RouteTableAssociation
  TestVpcPrivateSubnet2DefaultRouteA7EB6930:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: TestVpcPrivateSubnet2RouteTableCEF29F7C
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId:
        Ref: TestVpcPublicSubnet2NATGatewayA9858C31
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet2/DefaultRoute
  TestVpcIGW9DD53F70:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: TestStack/TestVpc
    Metadata:
      aws:cdk:path: TestStack/TestVpc/IGW
  TestVpcVPCGWF1827B84:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      InternetGatewayId:
        Ref: TestVpcIGW9DD53F70
    Metadata:
      aws:cdk:path: TestStack/TestVpc/VPCGW
  TestRepo08D311A0:
    Type: AWS::ECR::Repository
    UpdateReplacePolicy: Retain
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: TestStack/TestRepo/Resource
  imagecomponent:
    Type: AWS::ImageBuilder::Component
    Properties:
      Name: image-component
      Platform: Linux
      Version: 0.0.0
      Data: |
        
        schemaVersion: 1.0

        phases:
          - name: build
            steps:
              - name: bash-script
                action: ExecuteBash
                inputs:
                  commands:
                    - eval ":"
    Metadata:
      aws:cdk:path: TestStack/image-component
  testcontainerrecipe:
    Type: AWS::ImageBuilder::ContainerRecipe
    Properties:
      Components:
        - ComponentArn:
            Fn::GetAtt:
              - imagecomponent
              - Arn
      ContainerType: DOCKER
      Name: test-container
      ParentImage: ubuntu:20.04
      TargetRepository:
        RepositoryName:
          Ref: TestRepo08D311A0
        Service: ECR
      Version: 0.0.0
      DockerfileTemplateData: |
        FROM {{{ imagebuilder:parentImage }}}
        USER root
        {{{ imagebuilder:environments }}}
        {{{ imagebuilder:components }}}
      PlatformOverride: Linux
    Metadata:
      aws:cdk:path: TestStack/test-container-recipe
  TestStackD2D0B6A9:
    Type: AWS::S3::Bucket
    UpdateReplacePolicy: Retain
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: TestStack/TestStack/Resource
  testroleB50A37BE:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service:
                Fn::Join:
                  - ""
                  - - ec2.
                    - Ref: AWS::URLSuffix
        Version: "2012-10-17"
      ManagedPolicyArns:
        - Fn::Join:
            - ""
            - - "arn:"
              - Ref: AWS::Partition
              - :iam::aws:policy/AdministratorAccess
    Metadata:
      aws:cdk:path: TestStack/test-role/Resource
  testinstanceprofile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Roles:
        - Ref: testroleB50A37BE
      InstanceProfileName: test-instance-profile
    Metadata:
      aws:cdk:path: TestStack/test-instance-profile
  infraconfig:
    Type: AWS::ImageBuilder::InfrastructureConfiguration
    Properties:
      InstanceProfileName:
        Ref: testinstanceprofile
      Name: infra-config
      InstanceTypes:
        - t2.xlarge
      KeyPair: id-aqs-or
      Logging:
        S3Logs:
          S3BucketName:
            Ref: TestStackD2D0B6A9
      SecurityGroupIds:
        - Fn::GetAtt:
            - TestVpcE77CE678
            - DefaultSecurityGroup
      SubnetId:
        Ref: TestVpcPublicSubnet1SubnetA7DB1EDF
      TerminateInstanceOnFailure: false
    Metadata:
      aws:cdk:path: TestStack/infra-config
  CDKMetadata:
    Type: AWS::CDK::Metadata
    Properties:
      Analytics: v2:deflate64:H4sIAAAAAAAA/01QQW6DMBB8S+7GDcml1xRVFZcKkSrXymyWZAvYaL1uFCH+XiduSk8zOx7vjL3Ra73OGPQmX5mLz+DYZT01etqLgU5F6XNC2OjpMIIqWnuoClWFpifYh8ai3LSF1S4Ifpimx0VftJ33DsgIOftnvpHXsrrBu5E3I3gxV1UxfUe6LC6tIEf+MKQmv9NOYtfzgFZmhcB6qnF0nsTx3bhMs6LBnLAJ1B8x+uJh4YbR2XhV3QcrhixyjUAjpuCWjRcOIIExGlo6Bb6/YVZ+q6eXAF3qmFjMMEOs4NIfPLC0XowFrNi11OM8qxq9CwzJ9Y/HjCOl/dVVzs4+bfWzzvPVlyfKOFihAXWd8AdSof3vvwEAAA==
    Metadata:
      aws:cdk:path: TestStack/CDKMetadata/Default
    Condition: CDKMetadataAvailable
Conditions:
  CDKMetadataAvailable:
    Fn::Or:
      - Fn::Or:
          - Fn::Equals:
              - Ref: AWS::Region
              - af-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-east-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-northeast-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-northeast-2
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-southeast-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-southeast-2
          - Fn::Equals:
              - Ref: AWS::Region
              - ca-central-1
          - Fn::Equals:
              - Ref: AWS::Region
              - cn-north-1
          - Fn::Equals:
              - Ref: AWS::Region
              - cn-northwest-1
      - Fn::Or:
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-central-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-north-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-west-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-west-2
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-west-3
          - Fn::Equals:
              - Ref: AWS::Region
              - me-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - sa-east-1
          - Fn::Equals:
              - Ref: AWS::Region
              - us-east-1
          - Fn::Equals:
              - Ref: AWS::Region
              - us-east-2
      - Fn::Or:
          - Fn::Equals:
              - Ref: AWS::Region
              - us-west-1
          - Fn::Equals:
              - Ref: AWS::Region
              - us-west-2
Parameters:
  BootstrapVersion:
    Type: AWS::SSM::Parameter::Value<String>
    Default: /cdk-bootstrap/hnb659fds/version
    Description: Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store.
Rules:
  CheckBootstrapVersion:
    Assertions:
      - Assert:
          Fn::Not:
            - Fn::Contains:
                - - "1"
                  - "2"
                  - "3"
                  - "4"
                  - "5"
                - Ref: BootstrapVersion
        AssertDescription: CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI.

CloudFormation Script 2:

Resources:
  TestVpcE77CE678:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      InstanceTenancy: default
      Tags:
        - Key: Name
          Value: TestStack/TestVpc
    Metadata:
      aws:cdk:path: TestStack/TestVpc/Resource
  TestVpcPublicSubnet1SubnetA7DB1EDF:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.0.0/18
      VpcId:
        Ref: TestVpcE77CE678
      AvailabilityZone:
        Fn::Select:
          - 0
          - Fn::GetAZs: ""
      MapPublicIpOnLaunch: true
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Public
        - Key: aws-cdk:subnet-type
          Value: Public
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/Subnet
  TestVpcPublicSubnet1RouteTable4CBFF871:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/RouteTable
  TestVpcPublicSubnet1RouteTableAssociation7D1DECD9:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: TestVpcPublicSubnet1RouteTable4CBFF871
      SubnetId:
        Ref: TestVpcPublicSubnet1SubnetA7DB1EDF
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/RouteTableAssociation
  TestVpcPublicSubnet1DefaultRoute6C0F0315:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: TestVpcPublicSubnet1RouteTable4CBFF871
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId:
        Ref: TestVpcIGW9DD53F70
    DependsOn:
      - TestVpcVPCGWF1827B84
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/DefaultRoute
  TestVpcPublicSubnet1EIP4884338C:
    Type: AWS::EC2::EIP
    Properties:
      Domain: vpc
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/EIP
  TestVpcPublicSubnet1NATGatewayA323E3EC:
    Type: AWS::EC2::NatGateway
    Properties:
      SubnetId:
        Ref: TestVpcPublicSubnet1SubnetA7DB1EDF
      AllocationId:
        Fn::GetAtt:
          - TestVpcPublicSubnet1EIP4884338C
          - AllocationId
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet1/NATGateway
  TestVpcPublicSubnet2Subnet80A14523:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.64.0/18
      VpcId:
        Ref: TestVpcE77CE678
      AvailabilityZone:
        Fn::Select:
          - 1
          - Fn::GetAZs: ""
      MapPublicIpOnLaunch: true
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Public
        - Key: aws-cdk:subnet-type
          Value: Public
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/Subnet
  TestVpcPublicSubnet2RouteTable75B88314:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/RouteTable
  TestVpcPublicSubnet2RouteTableAssociationB386A819:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: TestVpcPublicSubnet2RouteTable75B88314
      SubnetId:
        Ref: TestVpcPublicSubnet2Subnet80A14523
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/RouteTableAssociation
  TestVpcPublicSubnet2DefaultRoute054DAE0A:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: TestVpcPublicSubnet2RouteTable75B88314
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId:
        Ref: TestVpcIGW9DD53F70
    DependsOn:
      - TestVpcVPCGWF1827B84
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/DefaultRoute
  TestVpcPublicSubnet2EIP83F7944C:
    Type: AWS::EC2::EIP
    Properties:
      Domain: vpc
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/EIP
  TestVpcPublicSubnet2NATGatewayA9858C31:
    Type: AWS::EC2::NatGateway
    Properties:
      SubnetId:
        Ref: TestVpcPublicSubnet2Subnet80A14523
      AllocationId:
        Fn::GetAtt:
          - TestVpcPublicSubnet2EIP83F7944C
          - AllocationId
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PublicSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PublicSubnet2/NATGateway
  TestVpcPrivateSubnet1SubnetCC65D771:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.128.0/18
      VpcId:
        Ref: TestVpcE77CE678
      AvailabilityZone:
        Fn::Select:
          - 0
          - Fn::GetAZs: ""
      MapPublicIpOnLaunch: false
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Private
        - Key: aws-cdk:subnet-type
          Value: Private
        - Key: Name
          Value: TestStack/TestVpc/PrivateSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet1/Subnet
  TestVpcPrivateSubnet1RouteTable469B0105:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PrivateSubnet1
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet1/RouteTable
  TestVpcPrivateSubnet1RouteTableAssociationFFD4DFF7:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: TestVpcPrivateSubnet1RouteTable469B0105
      SubnetId:
        Ref: TestVpcPrivateSubnet1SubnetCC65D771
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet1/RouteTableAssociation
  TestVpcPrivateSubnet1DefaultRoute32E7B814:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: TestVpcPrivateSubnet1RouteTable469B0105
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId:
        Ref: TestVpcPublicSubnet1NATGatewayA323E3EC
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet1/DefaultRoute
  TestVpcPrivateSubnet2SubnetDE0C64A2:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.192.0/18
      VpcId:
        Ref: TestVpcE77CE678
      AvailabilityZone:
        Fn::Select:
          - 1
          - Fn::GetAZs: ""
      MapPublicIpOnLaunch: false
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Private
        - Key: aws-cdk:subnet-type
          Value: Private
        - Key: Name
          Value: TestStack/TestVpc/PrivateSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet2/Subnet
  TestVpcPrivateSubnet2RouteTableCEF29F7C:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      Tags:
        - Key: Name
          Value: TestStack/TestVpc/PrivateSubnet2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet2/RouteTable
  TestVpcPrivateSubnet2RouteTableAssociation18250AB4:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: TestVpcPrivateSubnet2RouteTableCEF29F7C
      SubnetId:
        Ref: TestVpcPrivateSubnet2SubnetDE0C64A2
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet2/RouteTableAssociation
  TestVpcPrivateSubnet2DefaultRouteA7EB6930:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: TestVpcPrivateSubnet2RouteTableCEF29F7C
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId:
        Ref: TestVpcPublicSubnet2NATGatewayA9858C31
    Metadata:
      aws:cdk:path: TestStack/TestVpc/PrivateSubnet2/DefaultRoute
  TestVpcIGW9DD53F70:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: TestStack/TestVpc
    Metadata:
      aws:cdk:path: TestStack/TestVpc/IGW
  TestVpcVPCGWF1827B84:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId:
        Ref: TestVpcE77CE678
      InternetGatewayId:
        Ref: TestVpcIGW9DD53F70
    Metadata:
      aws:cdk:path: TestStack/TestVpc/VPCGW
  TestRepo08D311A0:
    Type: AWS::ECR::Repository
    UpdateReplacePolicy: Retain
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: TestStack/TestRepo/Resource
  imagecomponent:
    Type: AWS::ImageBuilder::Component
    Properties:
      Name: image-component
      Platform: Linux
      Version: 0.0.0
      Data: |
        
        schemaVersion: 1.0

        phases:
          - name: build
            steps:
              - name: bash-script
                action: ExecuteBash
                inputs:
                  commands:
                    - eval ":"
    Metadata:
      aws:cdk:path: TestStack/image-component
  testcontainerrecipe:
    Type: AWS::ImageBuilder::ContainerRecipe
    Properties:
      Components:
        - ComponentArn:
            Fn::GetAtt:
              - imagecomponent
              - Arn
      ContainerType: DOCKER
      Name: test-container
      ParentImage: ubuntu:20.04
      TargetRepository:
        RepositoryName:
          Ref: TestRepo08D311A0
        Service: ECR
      Version: 0.0.0
      DockerfileTemplateData: |
        FROM {{{ imagebuilder:parentImage }}}
        USER root
        {{{ imagebuilder:environments }}}
        {{{ imagebuilder:components }}}
      PlatformOverride: Linux
    Metadata:
      aws:cdk:path: TestStack/test-container-recipe
  TestStackD2D0B6A9:
    Type: AWS::S3::Bucket
    UpdateReplacePolicy: Retain
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: TestStack/TestStack/Resource
  testroleB50A37BE:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: sts:AssumeRole
            Effect: Allow
            Principal:
              Service:
                Fn::Join:
                  - ""
                  - - ec2.
                    - Ref: AWS::URLSuffix
        Version: "2012-10-17"
      ManagedPolicyArns:
        - Fn::Join:
            - ""
            - - "arn:"
              - Ref: AWS::Partition
              - :iam::aws:policy/AdministratorAccess
    Metadata:
      aws:cdk:path: TestStack/test-role/Resource
  testinstanceprofile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Roles:
        - Ref: testroleB50A37BE
      InstanceProfileName: test-instance-profile
    Metadata:
      aws:cdk:path: TestStack/test-instance-profile
  infraconfig:
    Type: AWS::ImageBuilder::InfrastructureConfiguration
    Properties:
      InstanceProfileName:
        Ref: testinstanceprofile
      Name: infra-config
      InstanceTypes:
        - t2.xlarge
      KeyPair: id-aqs-or
      Logging:
        S3Logs:
          S3BucketName:
            Ref: TestStackD2D0B6A9
      SecurityGroupIds:
        - Fn::GetAtt:
            - TestVpcE77CE678
            - DefaultSecurityGroup
      SubnetId:
        Ref: TestVpcPublicSubnet1SubnetA7DB1EDF
      TerminateInstanceOnFailure: false
    Metadata:
      aws:cdk:path: TestStack/infra-config
  testimage0:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-0
  testimage1:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-1
  testimage2:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-2
  testimage3:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-3
  testimage4:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-4
  testimage5:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-5
  testimage6:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-6
  testimage7:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-7
  testimage8:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-8
  testimage9:
    Type: AWS::ImageBuilder::Image
    Properties:
      InfrastructureConfigurationArn:
        Fn::GetAtt:
          - infraconfig
          - Arn
      ContainerRecipeArn:
        Fn::GetAtt:
          - testcontainerrecipe
          - Arn
    Metadata:
      aws:cdk:path: TestStack/test-image-9
  CDKMetadata:
    Type: AWS::CDK::Metadata
    Properties:
      Analytics: v2:deflate64:H4sIAAAAAAAA/01Qy27CMBD8Fu6Oy+PSK0VVlUsVhYpr5SwbWJLY0XpdhKL8ex1cSE8zOx7vjL3WS73MGPR6tTBXn8GxyVqq9LAXA42K0veAsNbDoQe1q+2h2KkiVC3BPlQWZdJmVrog+GWqFmd91rbeOyAj5OzTPJH3vJjg08iHEbyamyqYfiKdF+dWkCN/GFKTv2krseu5QyujQmA9lNg7T+L4bpynUVFnTlgFao8YffFw57re2XhV3QcrhixyiUA9puCajRcOIIExGmo6BX6+IZ/2jcpv9PAWoEllE4thpotdXPqMB+bWi7GABbuaWhxHVaJ3gSG5/vEYdqQpaFTFTc7Ovmz0q16tFhdPlHGwQh3qMuEvjeyZi8gBAAA=
    Metadata:
      aws:cdk:path: TestStack/CDKMetadata/Default
    Condition: CDKMetadataAvailable
Conditions:
  CDKMetadataAvailable:
    Fn::Or:
      - Fn::Or:
          - Fn::Equals:
              - Ref: AWS::Region
              - af-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-east-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-northeast-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-northeast-2
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-southeast-1
          - Fn::Equals:
              - Ref: AWS::Region
              - ap-southeast-2
          - Fn::Equals:
              - Ref: AWS::Region
              - ca-central-1
          - Fn::Equals:
              - Ref: AWS::Region
              - cn-north-1
          - Fn::Equals:
              - Ref: AWS::Region
              - cn-northwest-1
      - Fn::Or:
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-central-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-north-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-west-1
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-west-2
          - Fn::Equals:
              - Ref: AWS::Region
              - eu-west-3
          - Fn::Equals:
              - Ref: AWS::Region
              - me-south-1
          - Fn::Equals:
              - Ref: AWS::Region
              - sa-east-1
          - Fn::Equals:
              - Ref: AWS::Region
              - us-east-1
          - Fn::Equals:
              - Ref: AWS::Region
              - us-east-2
      - Fn::Or:
          - Fn::Equals:
              - Ref: AWS::Region
              - us-west-1
          - Fn::Equals:
              - Ref: AWS::Region
              - us-west-2
Parameters:
  BootstrapVersion:
    Type: AWS::SSM::Parameter::Value<String>
    Default: /cdk-bootstrap/hnb659fds/version
    Description: Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store.
Rules:
  CheckBootstrapVersion:
    Assertions:
      - Assert:
          Fn::Not:
            - Fn::Contains:
                - - "1"
                  - "2"
                  - "3"
                  - "4"
                  - "5"
                - Ref: BootstrapVersion
        AssertDescription: CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI.

CDK Script 1:

import aws_cdk as cdk
from aws_cdk import (
    aws_ec2 as ec2,
    aws_ecr as ecr,
    aws_iam as iam,
    aws_imagebuilder as ib,
    aws_s3 as s3,
)

DATA = """
schemaVersion: 1.0

phases:
  - name: build
    steps:
      - name: bash-script
        action: ExecuteBash
        inputs:
          commands:
            - eval ":"
"""
DOCKERFILE_TEMPLATE = (
    'FROM {{{ imagebuilder:parentImage }}}\n'
    'USER root\n'
    '{{{ imagebuilder:environments }}}\n'
    '{{{ imagebuilder:components }}}\n'
)


class TestStack(cdk.Stack):

    def __init__(self, scope, id):
        super().__init__(scope, id)
        # construct vpc/subnet
        vpc = ec2.Vpc(self, 'TestVpc')
        subnet = vpc.public_subnets[0]
        # construct ECR repo
        repo = ecr.Repository(self, 'TestRepo')
        target_repo = ib.CfnContainerRecipe.TargetContainerRepositoryProperty(
            repository_name=repo.repository_name, service='ECR'
        )
        # construct components
        components = [
            ib.CfnComponent(
                self,
                'image-component',
                name='image-component',
                platform='Linux',
                version='0.0.0',
                data=DATA,
            )
        ]
        components = [
            ib.CfnContainerRecipe.ComponentConfigurationProperty(
                component_arn=component.attr_arn
            )
            for component in components
        ]
        # construct recipe
        recipe = ib.CfnContainerRecipe(
            self,
            'test-container-recipe',
            name='test-container',
            container_type='DOCKER',
            dockerfile_template_data=DOCKERFILE_TEMPLATE,
            version='0.0.0',
            target_repository=target_repo,
            components=components,
            platform_override='Linux',
            parent_image='ubuntu:20.04',
        )

        # # construct logging bucket
        logging_bucket = s3.Bucket(
            self,
            id,
        )
        logging = ib.CfnInfrastructureConfiguration.LoggingProperty(
            s3_logs=ib.CfnInfrastructureConfiguration.S3LogsProperty(
                s3_bucket_name=logging_bucket.bucket_name
            )
        )
        # construct instance profile
        role = iam.Role(
            self,
            'test-role',
            managed_policies=[iam.ManagedPolicy.from_aws_managed_policy_name('AdministratorAccess')],
            assumed_by=iam.CompositePrincipal(iam.ServicePrincipal('ec2.amazonaws.com')),
        )
        instance_profile = iam.CfnInstanceProfile(
            self,
            'test-instance-profile',
            instance_profile_name='test-instance-profile',
            roles=[role.role_name],
        )
        # construct infrastructure configuration
        infra = ib.CfnInfrastructureConfiguration(
            self,
            'infra-config',
            name='infra-config',
            logging=logging,
            instance_profile_name=instance_profile.ref,
            subnet_id=subnet.subnet_id,
            security_group_ids=[vpc.vpc_default_security_group],
            key_pair='id-aqs-or',
            terminate_instance_on_failure=False,
            instance_types=['t2.xlarge'],
        )
        # for i in range (10):
        #     ib.CfnImage(
        #         self,
        #         f'test-image-{i}',
        #         infrastructure_configuration_arn=infra.attr_arn,
        #         container_recipe_arn=recipe.attr_arn,
        #     )


if __name__ == '__main__':
    app = cdk.App()
    TestStack(app, 'TestStack')
    app.synth()

CDK Script 2:

import aws_cdk as cdk
from aws_cdk import (
    aws_ec2 as ec2,
    aws_ecr as ecr,
    aws_iam as iam,
    aws_imagebuilder as ib,
    aws_s3 as s3,
)

DATA = """
schemaVersion: 1.0

phases:
  - name: build
    steps:
      - name: bash-script
        action: ExecuteBash
        inputs:
          commands:
            - eval ":"
"""
DOCKERFILE_TEMPLATE = (
    'FROM {{{ imagebuilder:parentImage }}}\n'
    'USER root\n'
    '{{{ imagebuilder:environments }}}\n'
    '{{{ imagebuilder:components }}}\n'
)


class TestStack(cdk.Stack):

    def __init__(self, scope, id):
        super().__init__(scope, id)
        # construct vpc/subnet
        vpc = ec2.Vpc(self, 'TestVpc')
        subnet = vpc.public_subnets[0]
        # construct ECR repo
        repo = ecr.Repository(self, 'TestRepo')
        target_repo = ib.CfnContainerRecipe.TargetContainerRepositoryProperty(
            repository_name=repo.repository_name, service='ECR'
        )
        # construct components
        components = [
            ib.CfnComponent(
                self,
                'image-component',
                name='image-component',
                platform='Linux',
                version='0.0.0',
                data=DATA,
            )
        ]
        components = [
            ib.CfnContainerRecipe.ComponentConfigurationProperty(
                component_arn=component.attr_arn
            )
            for component in components
        ]
        # construct recipe
        recipe = ib.CfnContainerRecipe(
            self,
            'test-container-recipe',
            name='test-container',
            container_type='DOCKER',
            dockerfile_template_data=DOCKERFILE_TEMPLATE,
            version='0.0.0',
            target_repository=target_repo,
            components=components,
            platform_override='Linux',
            parent_image='ubuntu:20.04',
        )

        # # construct logging bucket
        logging_bucket = s3.Bucket(
            self,
            id,
        )
        logging = ib.CfnInfrastructureConfiguration.LoggingProperty(
            s3_logs=ib.CfnInfrastructureConfiguration.S3LogsProperty(
                s3_bucket_name=logging_bucket.bucket_name
            )
        )
        # construct instance profile
        role = iam.Role(
            self,
            'test-role',
            managed_policies=[iam.ManagedPolicy.from_aws_managed_policy_name('AdministratorAccess')],
            assumed_by=iam.CompositePrincipal(iam.ServicePrincipal('ec2.amazonaws.com')),
        )
        instance_profile = iam.CfnInstanceProfile(
            self,
            'test-instance-profile',
            instance_profile_name='test-instance-profile',
            roles=[role.role_name],
        )
        # construct infrastructure configuration
        infra = ib.CfnInfrastructureConfiguration(
            self,
            'infra-config',
            name='infra-config',
            logging=logging,
            instance_profile_name=instance_profile.ref,
            subnet_id=subnet.subnet_id,
            security_group_ids=[vpc.vpc_default_security_group],
            key_pair='id-aqs-or',
            terminate_instance_on_failure=False,
            instance_types=['t2.xlarge'],
        )
        for i in range (10):
            ib.CfnImage(
                self,
                f'test-image-{i}',
                infrastructure_configuration_arn=infra.attr_arn,
                container_recipe_arn=recipe.attr_arn,
            )


if __name__ == '__main__':
    app = cdk.App()
    TestStack(app, 'TestStack')
    app.synth()
@OrEisenberg
Copy link
Author

I have uncovered further evidence corroborating that this bug is originating from a race condition of the type described in the aforementioned Go SDK issue. Namely, if one replaces the component in the above CloudFormation/CDK scripts with the following:

schemaVersion: 1.0

phases:
  - name: build
    steps:
      - name: bash-script
        action: ExecuteBash
        inputs:
          commands:
            - range=($(seq 0 1 100))
            - for i in ${range[@]}; do aws sts get-caller-identity &> /dev/null; done

then the failure rate hits 100%. The fact that this component is just making 100 SDK calls clearly suggests to me that it is indeed a randomly occurring bug (arising from a race condition) which is leading to this unfortunate behavior.

@ytsssun
Copy link

ytsssun commented Oct 1, 2021

Hi @OrEisenberg,

Both error log you posted

failed to download the EC2 Image Builder Component ''. Error - operation error
imagebuilder: GetComponent, failed to sign request: failed to retrieve credentials: failed to decode
EC2 IMDS role credentials, context canceled

or

failed to upload file to with error 'operation error
S3: PutObject, https response error StatusCode: 400, RequestID: , HostID: ,
api error AuthorizationHeaderMalformed: The authorization header is malformed; a non-empty Access
Key (AKID) must be provided in the credential.

Are from the TOE binary. That's something I can confirm. This makes feel this may still have something to do with the Go SDK V2 issue - https://githubmemory.com/repo/aws/aws-sdk-go-v2/issues/1253

@OrEisenberg
Copy link
Author

@ytssun I do agree that these errors are being raised from the TOE binary. Is TOE written in Go?

Whether this is an issue on the TOE end or the CloudFormation end is not entirely clear to me -- all I know is that triggering container builds from the command line works, but that triggering the build of those exact same images from CloudFormation fails nondeterministically with these errors. This suggests to me that there are one of two possible things going on:

  1. The TOE code is fine, but the CloudFormation code is invoking it incorrectly.
  2. The CloudFormation code is fine, but TOE is failing on the valid invocations which CloudFormation is making.

If I knew Java I might be able to dig into the CloudFormation side of things and help determine which of these two situations is actually occurring, but unfortunately I don't. Obviously, with TOE not being open source, that's also a dead end for me.

As before, please do let me know if there's any way I can help resolve this -- my team and I would really love to be able to build container images again!

@ytsssun
Copy link

ytsssun commented Oct 4, 2021

@OrEisenberg TOE is written in Go, and I can confirm the logs are coming from TOE. We have internally requesting Go SDK team to help with the issue, I will keep you updated with their response/solution and unblock you ASAP.

One question I do have is, how long have you been using Image Builder to build container images and when did you start experiencing such issue?

@OrEisenberg
Copy link
Author

@ytsssun That's fantastic news! I appreciate your responsiveness.

We've been using ImageBuilder + CloudFormation to build container images for 4 or 5 months now and started experiencing this problem around 2 weeks ago, give or take a few days.

@ytsssun
Copy link

ytsssun commented Oct 4, 2021

@OrEisenberg The timeline makes sense to me -- we recently migrated TOE to Go SDK V2 (roughly 2-3 weeks ago), and your experience matches that timeline as well.

One update: I indeed received response from the Go SDK Team. They acknowledged the issue and I think you have correctly located it:

We think this issue can be resolved in the SDK with the idea posted by aws/aws-sdk-go-v2#1253 (comment) customer. The SDK needs to be updated to make sure it reads the full response body in its response deserializer before continuing on.

I will let you know about the ETA once I get that from them, we will need to decide whether we need to implement the workaround ourselves or wait for their updates -- whichever way rolls out the fix faster would be the choice for us. Thanks again for pointing out the issue.

@OrEisenberg
Copy link
Author

@ytsssun Terrific. Thanks again so much for getting on this so quickly!

@ytsssun
Copy link

ytsssun commented Oct 21, 2021

Hi @OrEisenberg wanted to follow up and see if you are seeing this issue being resolved since the fix from Go SDK is rolled to all region now.

@OrEisenberg
Copy link
Author

@ytsssun My unit test for building container images is now passing! Thanks so much again for all your help on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants