Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host-ctr: add support for pulling private ECR images from il-central-1 #3127

Closed
wants to merge 1 commit into from

Conversation

etungsten
Copy link
Contributor

@etungsten etungsten commented May 18, 2023

Issue number:
N/A

Description of changes:

    host-ctr: add temporary workaround for bypassing checks for parsing ECR ref
    
    In order to support pulling images from ECR repositories from
    il-central-1, we need to generate our own canonical ECR image reference
    since the region is not officially supported by aws-go-sdk yet.

Testing done:
I am able to pull images from il-central-1:

$ ./host-ctr -s /run/containerd/containerd.sock pull-image --source 777777777777.dkr.ecr.il-central-1.amazonaws.com/eks/pause:3.2-eksbuild.1
time="2023-05-17T18:13:49-07:00" level=info msg="Image does not exist, proceeding to pull image from source." ref="ecr.aws/arn:aws:ecr:il-central-1:777777777777:repository/eks/pause:3.2-eksbuild.1"
time="2023-05-17T18:13:49-07:00" level=info msg="pulling with Amazon ECR Resolver" ref="ecr.aws/arn:aws:ecr:il-central-1:777777777777:repository/eks/pause:3.2-eksbuild.1"
time="2023-05-17T18:13:54-07:00" level=info msg="pulled image successfully" img="ecr.aws/arn:aws:ecr:il-central-1:777777777777:repository/eks/pause:3.2-eksbuild.1"
time="2023-05-17T18:13:54-07:00" level=info msg="unpacking image..." img="ecr.aws/arn:aws:ecr:il-central-1:777777777777:repository/eks/pause:3.2-eksbuild.1"
time="2023-05-17T18:13:54-07:00" level=info msg="tagging image" img="777777777777.dkr.ecr.il-central-1.amazonaws.com/eks/pause:3.2-eksbuild.1"

$ ./host-ctr -s /run/containerd/containerd.sock pull-image --source 777777777777.dkr.ecr.il-central-1.amazonaws.com/bottlerocket-admin:v0.10.1
time="2023-05-17T18:12:52-07:00" level=info msg="pulling with Amazon ECR Resolver" ref="ecr.aws/arn:aws:ecr:il-central-1:777777777777:repository/bottlerocket-admin:v0.10.1"
time="2023-05-17T18:12:53-07:00" level=info msg="pulled image successfully" img="ecr.aws/arn:aws:ecr:il-central-1:777777777777:repository/bottlerocket-admin:v0.10.1"
time="2023-05-17T18:12:53-07:00" level=info msg="unpacking image..." img="ecr.aws/arn:aws:ecr:il-central-1:777777777777:repository/bottlerocket-admin:v0.10.1"
time="2023-05-17T18:12:53-07:00" level=info msg="tagging image" img="777777777777.dkr.ecr.il-central-1.amazonaws.com/bottlerocket-admin:v0.10.1"

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@etungsten etungsten force-pushed the barse branch 3 times, most recently from 60a8748 to cc6c653 Compare May 18, 2023 01:28
Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some formatting looks a little odd, but golangci-lint isn't complaining about it, so good enough for me!

"github.com/containerd/containerd/reference"
"github.com/opencontainers/go-digest"

"fmt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Import grouping got messed up from std, third party, local, but not a critical issue.

Copy link
Contributor

@jpmcb jpmcb May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that goimports doesn't re-arrange. I think since it got added to it's own block, goimports is treating it seperatly:

find . -name "*.go" -not -path "*/vendor/*" | xargs goimports -w

so moving fmt into the top block with the other standard library things and running goimports to get it sorted should fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had to run goimports to placate the linter and this is what the tool gave me.

Copy link
Contributor

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't comment too much on the actual code since that was mostly lifted from amazon-ecr-resolver

This is a good opportunity to add some additional tests in main_test.go

func TestEcrSpecialRegionsParser(t *testing.T) {
        tests := []struct {
                name           string
                ecrImgURI      string
                expectedErr    error
                expectedString string
        }{
                // The default case for normal regions handled by the aws-go-sdk
                {
                        "Parse amazon-ecr-resolver region",
                        "111111111111.dkr.ecr.us-west-2.amazonaws.com/bottlerocket/container:1.2.3",
                        nil,
                        "ecr.aws/arn:aws:ecr:us-west-2:111111111111:repository/bottlerocket/container:1.2.3",
                },

                // Regions parsed and handled by host-ctr
                {
                        "Parse special case region",
                        "111111111111.dkr.ecr.il-central-1.amazonaws.com/bottlerocket/container:1.2.3",
                        nil,
                        "ecr.aws/arn:aws:ecr:il-central-1:111111111111:repository/bottlerocket/container:1.2.3",
                },

                // Failure and edge cases
                {
                        "Fallback to public ECR on invalid region",
                        "111111111111.dkr.ecr.outer-space.amazonaws.com/bottlerocket/container:1.2.3",
                        nil,
                        "111111111111:repository/bottlerocket/container:1.2.3",
                },
        }

        for _, tc := range tests {
                t.Run(tc.name, func(t *testing.T) {
                        result, err := parseImageURISpecialRegions(tc.ecrImgURI)
                        assert.Equal(t, tc.expectedErr, err)
                        assert.Equal(t, tc.expectedString, result)
                })
        }
}

These don't capture all the edge cases, but it will definitely make it easier for other devs on the team to see what's going on with this function.



// Get the ECR image reference prefix from the AWS region
ecrRefPrefix := ecrRefPrefixMapping[region]
return ecrRefPrefix + account + ":repository/" + fullRepoPath, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first test case I wrote fails where I expected it to just work:

❯ go test ./...
--- FAIL: TestEcrSpecialRegionsParser (0.00s)
    --- FAIL: TestEcrSpecialRegionsParser/Parse_amazon-ecr-resolver_region (0.00s)
        main_test.go:46:
                Error Trace:    /home/ubuntu/workspace/bottlerocket-os/bottlerocket/sources/host-ctr/cmd/host-ctr/main_test.go:46
                Error:          Not equal:
                                expected: "ecr.aws/arn:aws:ecr:us-west-2:111111111111:repository/bottlerocket/container:1.2.3"
                                actual  : "111111111111:repository/bottlerocket/container:1.2.3"

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1 +1 @@
                                -ecr.aws/arn:aws:ecr:us-west-2:111111111111:repository/bottlerocket/container:1.2.3
                                +111111111111:repository/bottlerocket/container:1.2.3
                Test:           TestEcrSpecialRegionsParser/Parse_amazon-ecr-resolver_region
FAIL
FAIL    host-ctr/cmd/host-ctr   0.016s
FAIL

It looks like with a us-west-2 region, which is bundled by aws-sdk-go and should be handled by the amazon-ecr-resolver, falls back to using our ecrRegPrefixMapping which does not have the us-west-2 region key and returns an empty string, resulting in 111111111111:repository/bottlerocket/container:1.2.3.

Is this the expected behavior?

Copy link
Contributor

@jpmcb jpmcb May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to check the map access here?

	val, ok := ecrRefPrefixMapping[region]
	if !ok {
                return "", fmt.Errorf("%v: %s", "invalid region in internal mapping", region)
	}

Otherwise we will always be masking unknown values in the prefix mapping by assuming an empty string.

Here's a little go playground to demonstrate this as well https://go.dev/play/p/W_8XzLuWOwp

Copy link
Contributor

@jpmcb jpmcb May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also makes me wonder what the actual expected behavior for completely invalid regions is:

                {
                        "Fallback to public ECR on invalid region",
                        "111111111111.dkr.ecr.outer-space.amazonaws.com/bottlerocket/container:1.2.3",
                        nil,
                        "111111111111:repository/bottlerocket/container:1.2.3",
                },

So, for region outer-space in this test case, what's the expected result? It looks like the same thing is happening where the value from ecrRefPrefixMapping (which doesn't exist) is returning the empty string and is not populating the ecrRefPrefix portion of the returned image uri

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably disregard the error I'm seeing with this function on us-west-2 since calling the amazon-ecr-resolver shouldn't error out when called on those regions before we make it to the host-ctr's parseImageURISpecialRegions

https://github.com/etungsten/bottlerocket/blob/cc6c6530311ec5be4d96bfe72c04688bf6f791de/sources/host-ctr/cmd/host-ctr/main.go#L642-L651

Although that makes me abit nervous since there's a clear failure path where a valid region is returned in error by the ecr-resolver and is not covered by our subsequent function call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like with a us-west-2 region, which is bundled by aws-sdk-go and should be handled by the amazon-ecr-resolver, falls back to using our ecrRegPrefixMapping which does not have the us-west-2 region key and returns an empty string, resulting in 111111111111:repository/bottlerocket/container:1.2.3.
Is this the expected behavior?

Yes, that's expected behavior. The fallback only triggers when ecr.ParseImageURI fails to parse out the ECR image source.

So, for region outer-space in this test case, what's the expected result? It looks like the same thing is happening where the value from ecrRefPrefixMapping (which doesn't exist) is returning the empty string and is not populating the ecrRefPrefix portion of the returned image uri

We error since we're limiting the fallback to only il-central-1 and no other "special" region that isn't yet supported by the aws-go-sdk. This is only a temporary workaround for il-central-1 that is meant to removed when a more stable and long-term fix is available.

Although that makes me abit nervous since there's a clear failure path where a valid region is returned in error by the ecr-resolver and is not covered by our subsequent function call.

If ecr.ParseImageURI errors for whatever reason on a ECR image URL belonging to some "valid" region (valid meaning supported by the aws-go-sdk. We will fail regardless in the fallback, so the end result is the same.

Comment on lines 644 to 654
var nestedErr error
// The parsing might fail if the AWS region is special, parse again with special handling:
ref, nestedErr = parseImageURISpecialRegions(ref)
if nestedErr != nil {
log.G(ctx).WithError(err).WithField("source", source).Error("failed to parse ECR reference")
// Return the original error
return nil, err
}
Copy link
Contributor

@jpmcb jpmcb May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This never does anything with nestedErr.

err in this context is the error returned out of the ecr-resolver's ParseImageURI which we might expect to fail if we need to handle a special region.

It's unfortunate that the ecr resolver doesn't return typed errors (that we could check to only do our own parsing when given a certain error). But we should still handle all errors out of our own parsing function. So, instead, we maybe we return both errors constructed through a new error:

if nestedErr != nil {
    log.G(ctx).WithError(err).WithField("source", source).Error("failed to parse ECR reference")
    log.G(ctx).WithError(nestedErr).WithField("source", source).Error("failed to parse special ECR reference")
    // Return both errors
    return nil, fmt.Errorf("%w: could not parse special regions: %w", err, nestedErr)
}

ecrRefPrefixMapping := map[string]string{
"il-central-1": "ecr.aws/arn:aws:ecr:il-central-1:",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer we keep these mappings in a file that we read in. This allows us to keep the code that solves this specific problem separate from a region's arrival in the upstream SDK. It would be even nicer if it was a general purpose file that other tools that need endpoints could use. This is our only use we have found but it's conceivable other things would need similar "overrides". We would probably need the format of the file to be more generic:

{
    "endpoints": [
        "il-central-1": {
            "ecrPrefix":  "ecr.aws/arn:aws:ecr:il-central-1:",
            ....
        }
    ]
} 

This is purely a nice optimization and I don't think we have to do this now, but it would make this code a little more maintainable since the JSON file would change periodically, but we don't have to hopefully worry about this code block in the future. This is more or less how the rest of the AWS SDK's like to think about regional endpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we've discussed offline, this workaround is temporary and we have no intention of supporting this in the long term. We will attempt to fast track a more stable and long term fix for special non-SDK-supported regions.

@etungsten etungsten marked this pull request as draft May 18, 2023 18:00
@etungsten
Copy link
Contributor Author

We've decided to take a different approach to this, so taking this PR back into draft.

…CR ref

In order to support pulling images from ECR repositories from
il-central-1, we need to generate our own canonical ECR image reference
since the region is not officially supported by aws-go-sdk yet.
@etungsten
Copy link
Contributor Author

Ditching this in favor of #3138

@etungsten etungsten closed this May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants