Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STS: AssumeRole, https response error StatusCode: 403 #2567

Closed
2 tasks done
stgleb opened this issue Mar 19, 2024 · 13 comments
Closed
2 tasks done

STS: AssumeRole, https response error StatusCode: 403 #2567

stgleb opened this issue Mar 19, 2024 · 13 comments
Assignees
Labels
bug This issue is a bug. needs-reproduction This issue needs reproduction. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@stgleb
Copy link

stgleb commented Mar 19, 2024

Acknowledgements

Describe the bug

After updating aws dependencies I have following error on DescribeRegions:

Expected Behavior

Expected 200 Ok

Current Behavior

operation error EC2: DescribeRegions, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 9b051175-e056-4572-bbdd-1fa1990e5b99, api error InvalidClientTokenId: The security token included in the request is invalid.

Reproduction Steps

  1. update aws deps
  2. go clean -modcache && go mod tidy
  3. rebuild

Possible Solution

No response

Additional Information/Context

Code:

sts code:

func GetStsConfig(log logrus.FieldLogger, stsClient *sts.Client, region, roleARN, clusterID string) (aws.Config, error) {
	opts := func(options *stscreds.AssumeRoleOptions) {
		options.ExternalID = to.StringPtr(clusterID)
	}
	cfg, err := config.LoadDefaultConfig(
		context.Background(),
		config.WithRegion(region),
		config.WithRetryer(func() aws.Retryer {
			r := retry.NewStandard(func(opts *retry.StandardOptions) {
				opts.MaxAttempts = 5
			})
			return castaws.NewCustomRetryer(r)
		}),
		config.WithClientLogMode(aws.LogRetries),
		config.WithLogger(logging.LoggerFunc(func(classification logging.Classification, format string, v ...interface{}) {
			log.Debugf(format, v...)
		})),
		config.WithCredentialsProvider(
			stscreds.NewAssumeRoleProvider(stsClient, roleARN, opts),
		),
	)
	if err != nil {
		return aws.Config{}, err
	}
	cfg.APIOptions = append(cfg.APIOptions, func(stack *smithymiddleware.Stack) error {
		return stack.Deserialize.Add(reportMetricsMiddleware(clusterID), smithymiddleware.After)
	})
	return cfg, nil
}

When error happens:

	if _, err := s.ec2.DescribeRegions(ctx, &ec2.DescribeRegionsInput{AllRegions: aws.Bool(true)}, func(options *ec2.Options) { options.Retryer = retryer }); err != nil {
		if _, ok := castaws.APIError(err); ok {
			return casterrors.NewInvalidCredentialsf(err, "missing permissions to access ec2 regions")
		}
		return err
	}

AWS Go SDK V2 Module Versions Used

current dependencies

	github.com/aws/aws-sdk-go v1.51.1
	github.com/aws/aws-sdk-go-v2 v1.26.0
	github.com/aws/aws-sdk-go-v2/config v1.27.8
	github.com/aws/aws-sdk-go-v2/credentials v1.17.8
	github.com/aws/aws-sdk-go-v2/service/autoscaling v1.40.4
	github.com/aws/aws-sdk-go-v2/service/ec2 v1.152.0
	github.com/aws/aws-sdk-go-v2/service/ecr v1.27.3
	github.com/aws/aws-sdk-go-v2/service/eks v1.41.2
	github.com/aws/aws-sdk-go-v2/service/iam v1.31.3
	github.com/aws/aws-sdk-go-v2/service/sts v1.28.5
	github.com/aws/smithy-go v1.20.1
	github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 v1.30.4
	github.com/aws/aws-sdk-go-v2/service/kms v1.30.0
	github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.16.12 // indirect
	github.com/aws/aws-sdk-go-v2/service/secretsmanager v1.28.4 // indirect
	github.com/aws/aws-sdk-go-v2/service/ssm v1.49.4
	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.1 // indirect
	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.15.4 // indirect
	github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.4 // indirect
	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.4 // indirect
	github.com/aws/aws-sdk-go-v2/internal/ini v1.8.0 // indirect
	github.com/aws/aws-sdk-go-v2/internal/v4a v1.3.4 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.11.1 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.3.6 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.11.6 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.17.4 // indirect
	github.com/aws/aws-sdk-go-v2/service/s3 v1.53.0 // indirect
	github.com/aws/aws-sdk-go-v2/service/sns v1.29.3 // indirect
	github.com/aws/aws-sdk-go-v2/service/sqs v1.31.3 // indirect
	github.com/aws/aws-sdk-go-v2/service/sso v1.20.3 // indirect
	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.23.3 // indirect

Before

	github.com/aws/aws-sdk-go v1.47.9
	github.com/aws/aws-sdk-go-v2 v1.22.2
	github.com/aws/aws-sdk-go-v2/config v1.23.0
	github.com/aws/aws-sdk-go-v2/credentials v1.15.2
	github.com/aws/aws-sdk-go-v2/service/autoscaling v1.26.0
	github.com/aws/aws-sdk-go-v2/service/ec2 v1.72.1
	github.com/aws/aws-sdk-go-v2/service/ecr v1.17.12
	github.com/aws/aws-sdk-go-v2/service/eks v1.2.2
	github.com/aws/aws-sdk-go-v2/service/iam v1.3.1
	github.com/aws/aws-sdk-go-v2/service/sts v1.25.1
	github.com/aws/smithy-go v1.16.0
	github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 v1.23.0
	github.com/aws/aws-sdk-go-v2/service/kms v1.24.5
	github.com/aws/aws-sdk-go-v2/service/secretsmanager v1.21.3 // indirect
	github.com/aws/aws-sdk-go-v2/service/ssm v1.37.5
	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.5.0 // indirect
	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.14.3 // indirect
	github.com/aws/aws-sdk-go-v2/internal/configsources v1.2.2 // indirect
	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.5.2 // indirect
	github.com/aws/aws-sdk-go-v2/internal/ini v1.6.0 // indirect
	github.com/aws/aws-sdk-go-v2/internal/v4a v1.2.2 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.10.0 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.2.2 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.10.2 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.16.2 // indirect
	github.com/aws/aws-sdk-go-v2/service/s3 v1.42.1 // indirect
	github.com/aws/aws-sdk-go-v2/service/sns v1.20.6 // indirect
	github.com/aws/aws-sdk-go-v2/service/sqs v1.28.0 // indirect
	github.com/aws/aws-sdk-go-v2/service/sso v1.17.1 // indirect
	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.19.1 // indirect

Compiler and Version used

go version go1.22.1 darwin/arm64

Operating System and version

macOS

@stgleb stgleb added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 19, 2024
@stgleb
Copy link
Author

stgleb commented Mar 20, 2024

package main

import (
	"context"

	"github.com/Azure/go-autorest/autorest/to"
	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/aws/retry"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/credentials"
	"github.com/aws/aws-sdk-go-v2/credentials/stscreds"
	"github.com/aws/aws-sdk-go-v2/service/ec2"
	"github.com/aws/aws-sdk-go-v2/service/sts"
	"github.com/aws/smithy-go/logging"
	"github.com/sirupsen/logrus"
)

func GetDefaultConfig(log logrus.FieldLogger, region, clusterID string) (aws.Config, error) {
	cfg, err := config.LoadDefaultConfig(
		context.Background(),
		config.WithRegion(region),
		config.WithRetryer(func() aws.Retryer {
			r := retry.NewStandard(func(opts *retry.StandardOptions) {
				opts.MaxAttempts = 5
			})
			return r
		}),
		config.WithClientLogMode(aws.LogRetries),
		config.WithLogger(logging.LoggerFunc(func(classification logging.Classification, format string, v ...interface{}) {
			log.Debugf(format, v...)
		})),
		config.WithCredentialsProvider(
			credentials.NewStaticCredentialsProvider(
				"****************",
				"****************",
				"session"),
		),
	)
	if err != nil {
		return aws.Config{}, err
	}
	return cfg, nil
}

func GetStsConfig(log logrus.FieldLogger, stsClient *sts.Client, region, roleARN, clusterID string) (aws.Config, error) {
	opts := func(options *stscreds.AssumeRoleOptions) {
		options.ExternalID = to.StringPtr(clusterID)
	}
	cfg, err := config.LoadDefaultConfig(
		context.Background(),
		config.WithRegion(region),
		config.WithRetryer(func() aws.Retryer {
			r := retry.NewStandard(func(opts *retry.StandardOptions) {
				opts.MaxAttempts = 5
			})
			return r
		}),
		config.WithClientLogMode(aws.LogRetries),
		config.WithLogger(logging.LoggerFunc(func(classification logging.Classification, format string, v ...interface{}) {
			log.Debugf(format, v...)
		})),
		config.WithCredentialsProvider(
			stscreds.NewAssumeRoleProvider(stsClient, roleARN, opts),
		),
	)
	if err != nil {
		return aws.Config{}, err
	}
	return cfg, nil
}

func main() {
	log := logrus.New()
	cfg, err := GetDefaultConfig(log, "eu-central-1",
		"*****************")
	if err != nil {
		log.Fatal(err)
	}
	stsSvc := sts.NewFromConfig(cfg)
	cfg, err = GetStsConfig(log, stsSvc,
		"eu-central-1",
		"role-arn",
		"********")
	if err != nil {
		log.Fatal(err.Error())
	}
	ec2Client := ec2.NewFromConfig(cfg)
	resp, err := ec2Client.DescribeRegions(context.Background(),
		&ec2.DescribeRegionsInput{AllRegions: aws.Bool(true)})
	if err != nil {
		log.Fatal(err)
	}
	log.Println(resp.Regions)
}

go.mod

module kubecast/services/experiment-sts

go 1.22.1

require (
	github.com/Azure/go-autorest/autorest/to v0.4.0
	github.com/aws/aws-sdk-go-v2 v1.26.0
	github.com/aws/aws-sdk-go-v2/config v1.27.8
	github.com/aws/aws-sdk-go-v2/credentials v1.17.8
	github.com/aws/aws-sdk-go-v2/service/ec2 v1.152.0
	github.com/aws/aws-sdk-go-v2/service/sts v1.28.5
	github.com/aws/smithy-go v1.20.1
	github.com/sirupsen/logrus v1.9.3
)

require (
	github.com/Azure/go-autorest v14.2.0+incompatible // indirect
	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.15.4 // indirect
	github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.4 // indirect
	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.4 // indirect
	github.com/aws/aws-sdk-go-v2/internal/ini v1.8.0 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.11.1 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.11.6 // indirect
	github.com/aws/aws-sdk-go-v2/service/sso v1.20.3 // indirect
	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.23.3 // indirect
	github.com/jmespath/go-jmespath v0.4.0 // indirect
	golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8 // indirect
)

@RanVaknin RanVaknin self-assigned this Mar 20, 2024
@stgleb
Copy link
Author

stgleb commented Mar 20, 2024

@RanVaknin FYI, MFA is disabled.

@RanVaknin
Copy link
Contributor

Hi @stgleb ,

Thanks for reaching out and for all the code.
Since your implementation is quite involved, it will be very helpful if you can do the following:

  1. add the request and response logging:
		config.WithClientLogMode(aws.LogRequestWithBody | aws.LogResponseWithBody),
  1. downgrade the SDK to the version that was working for you, inspect the logs and share them with us.
  2. upgrade again the SDK to version that was breaking, and once again run your code, inspect the logs and share them with us.

By comparing the request and response logs we can identify any discrepancies between the two states (working and broken) and would help us identify the offending part of the request that might point to a recent change.

Thanks,
Ran~

@RanVaknin RanVaknin added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Mar 20, 2024
@stgleb
Copy link
Author

stgleb commented Mar 20, 2024

@RanVaknin So far this is logs for current version, so looks like token is issued, but then is invalid.


2024-03-20T22:38:47.249964541Z stdout F time="2024-03-20T22:38:47Z" level=debug msg="Response\nHTTP/1.1 200 OK\r\nContent-Length: 1539\r\nContent-Type: text/xml\r\nDate: Wed, 20 Mar 2024 22:38:46 GMT\r\nX-Amzn-Requestid: 6b505f51-1a24-4958-8ebd-c1210371e2dc\r\n\r\n<AssumeRoleResponse xmlns=\"https://sts.amazonaws.com/doc/2011-06-15/\">\n  <AssumeRoleResult>\n    <AssumedRoleUser>\n      <AssumedRoleId>AROAREDACTED:aws-go-sdk-1710974327078568711</AssumedRoleId>\n      <Arn>arn:aws:sts::487609081575:assumed-role/cast-eks-core-e2e-eks-20240116-cluster-role-be095f22/aws-go-sdk-1710974327078568711</Arn>\n    </AssumedRoleUser>\n    <Credentials>\n      <AccessKeyId>REDACTED</AccessKeyId>\n      <SecretAccessKey>REDACTED</SecretAccessKey>\n      <SessionToken>REDACTED</SessionToken>\n      <Expiration>2024-03-20T22:53:47Z</Expiration>\n    </Credentials>\n  </AssumeRoleResult>\n  <ResponseMetadata>\n    <RequestId>6b505f51-1a24-4958-8ebd-c1210371e2dc</RequestId>\n  </ResponseMetadata>\n</AssumeRoleResponse>\n" cluster_id=be095f22-76cc-49b8-996a-c673324d9acc instance_id=external-provisioner-worker-65dc86fd8d-xrkdp level_int=5 pool=cluster provider_type=eks reconcile_id=53575be0-a726-4189-855b-63db350f037d |  
-- | --
  |   | 2024-03-20 23:38:47.224 | 2024-03-20T22:38:47.079875509Z stdout F time="2024-03-20T22:38:47Z" level=debug msg="Request\nPOST / HTTP/1.1\r\nHost: sts.eu-central-1.amazonaws.com\r\nUser-Agent: aws-sdk-go-v2/1.26.0 os/linux lang/go#1.22.1 md/GOOS#linux md/GOARCH#arm64 api/sts#1.28.5\r\nContent-Length: 255\r\nAmz-Sdk-Invocation-Id: 63b60ab4-8702-4a9a-9be9-882a8b127d8b\r\nAmz-Sdk-Request: attempt=1; max=5\r\nAuthorization: AWS4-HMAC-SHA256 Credential=AKIAQNCLJGYSFISKWM7W/20240320/eu-central-1/sts/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;x-amz-date, Signature=70ddf0d2d368a6aca7402cd7f03ea54e9d7398f3b9093ec4ebd817f0c9170886\r\nContent-Type: application/x-www-form-urlencoded\r\nX-Amz-Date: 20240320T223847Z\r\nAccept-Encoding: gzip\r\n\r\nAction=AssumeRole&DurationSeconds=900&ExternalId=be095f22-76cc-49b8-996a-c673324d9acc&RoleArn=arn%3Aaws%3Aiam%3A%3A487609081575%3Arole%2Fcast-eks-core-e2e-eks-20240116-cluster-role-be095f22&RoleSessionName=aws-go-sdk-1710974327078568711&Version=2011-06-15" cluster_id=be095f22-76cc-49b8-996a-c673324d9acc instance_id=external-provisioner-worker-65dc86fd8d-xrkdp level_int=5 pool=cluster provider_type=eks reconcile_id=53575be0-a726-4189-855b-63db350f037d


@stgleb
Copy link
Author

stgleb commented Mar 20, 2024

@RanVaknin


2024-03-20T20:15:35.013288788Z stdout F time="2024-03-20T20:15:35Z" level=debug msg="Response\nHTTP/1.1 200 OK\r\nContent-Length: 1539\r\nContent-Type: text/xml\r\nDate: Wed, 20 Mar 2024 20:15:34 GMT\r\nX-Amzn-Requestid: 7795857e-d350-48e5-bc8f-35c70c70ba13\r\n\r\n<AssumeRoleResponse xmlns=\"https://sts.amazonaws.com/doc/2011-06-15/\">\n  <AssumeRoleResult>\n    <AssumedRoleUser>\n      <AssumedRoleId>AROAXREDACTED:aws-go-sdk-1710965734836842112</AssumedRoleId>\n      <Arn>arn:aws:sts::487609081575:assumed-role/cast-eks-core-e2e-eks-20240116-cluster-role-45b63681/aws-go-sdk-1710965734836842112</Arn>\n    </AssumedRoleUser>\n    <Credentials>\n      <AccessKeyId>REDACTED</AccessKeyId>\n      <SecretAccessKey>REDACTED</SecretAccessKey>\n      <SessionToken>REDACTED</SessionToken>\n      <Expiration>2024-03-20T20:30:34Z</Expiration>\n    </Credentials>\n  </AssumeRoleResult>\n  <ResponseMetadata>\n    <RequestId>7795857e-d350-48e5-bc8f-35c70c70ba13</RequestId>\n  </ResponseMetadata>\n</AssumeRoleResponse>\n" cluster_id=45b63681-cc17-4daa-9696-514f6ea47a00 instance_id=external-provisioner-worker-78f9fbbc56-8psw6 level_int=5 pool=cluster provider_type=eks reconcile_id=40c55e9a-2bd2-4354-88de-208a740cdded2024-03-20 21:15:35.0252024-03-20T20:15:34.838358412Z stdout F time="2024-03-20T20:15:34Z" level=debug msg="Request\nPOST / HTTP/1.1\r\nHost: sts.eu-central-1.amazonaws.com\r\nUser-Agent: aws-sdk-go-v2/1.22.2 os/linux lang/go#1.22.1 md/GOOS#linux md/GOARCH#arm64 api/sts#1.25.1\r\nContent-Length: 255\r\nAmz-Sdk-Invocation-Id: 491035f5-fcdd-42bf-8216-d3f005215549\r\nAmz-Sdk-Request: attempt=1; max=5\r\nAuthorization: AWS4-HMAC-SHA256 Credential=<ACCESS_KEY_ID>/20240320/eu-central-1/sts/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;x-amz-date, Signature=19eb0754581c180563ced9d324eea56f7f969e16a416db91ed89afa6c1225ea2\r\nContent-Type: application/x-www-form-urlencoded\r\nX-Amz-Date: 20240320T201534Z\r\nAccept-Encoding: gzip\r\n\r\nAction=AssumeRole&DurationSeconds=900&ExternalId=45b63681-cc17-4daa-9696-514f6ea47a00&RoleArn=arn%3Aaws%3Aiam%3A%3A487609081575%3Arole%2Fcast-eks-core-e2e-eks-20240116-cluster-role-45b63681&RoleSessionName=aws-go-sdk-1710965734836842112&Version=2011-06-15" cluster_id=45b63681-cc17-4daa-9696-514f6ea47a00 instance_id=external-provisioner-worker-78f9fbbc56-8psw6 level_int=5 pool=cluster provider_type=eks reconcile_id=40c55e9a-2bd2-4354-88de-208a740cddedFieldsalert_groupconsole/external-provisioner-workerappexternal-provisioner-workercontainerexternal-provisioner-workerfilename/var/log/pods/console_external-provisioner-worker-78f9fbbc56-8psw6_728a61c3-b2ec-4ae7-bc52-03eec3ccfd45/external-provisioner-worker/1.logjobconsole/external-provisioner-workerleveldebugnamespaceconsolenode_nametilt-control-planepodexternal-provisioner-worker-78f9fbbc56-8psw6teamkube |   | 2024-03-20T20:15:35.013288788Z stdout F time="2024-03-20T20:15:35Z" level=debug msg="Response\nHTTP/1.1 200 OK\r\nContent-Length: 1539\r\nContent-Type: text/xml\r\nDate: Wed, 20 Mar 2024 20:15:34 GMT\r\nX-Amzn-Requestid: 7795857e-d350-48e5-bc8f-35c70c70ba13\r\n\r\n<AssumeRoleResponse xmlns=\"https://sts.amazonaws.com/doc/2011-06-15/\">\n  <AssumeRoleResult>\n    <AssumedRoleUser>\n      <AssumedRoleId>AROAXDB6ED3T532E7MY5Q:aws-go-sdk-1710965734836842112</AssumedRoleId>\n      <Arn>arn:aws:sts::487609081575:assumed-role/cast-eks-core-e2e-eks-20240116-cluster-role-45b63681/aws-go-sdk-1710965734836842112</Arn>\n    </AssumedRoleUser>\n    <Credentials>\n      <AccessKeyId><ACCESS_KEY_ID></AccessKeyId>\n      <SecretAccessKey>REDACTED</SecretAccessKey>\n      <SessionToken>REDACTED</SessionToken>\n      <Expiration>2024-03-20T20:30:34Z</Expiration>\n    </Credentials>\n  </AssumeRoleResult>\n  <ResponseMetadata>\n    <RequestId>7795857e-d350-48e5-bc8f-35c70c70ba13</RequestId>\n  </ResponseMetadata>\n</AssumeRoleResponse>\n" cluster_id=45b63681-cc17-4daa-9696-514f6ea47a00 instance_id=external-provisioner-worker-78f9fbbc56-8psw6 level_int=5 pool=cluster provider_type=eks reconcile_id=40c55e9a-2bd2-4354-88de-208a740cdded |   |   |   | 2024-03-20 21:15:35.025 | 2024-03-20T20:15:34.838358412Z stdout F time="2024-03-20T20:15:34Z" level=debug msg="Request\nPOST / HTTP/1.1\r\nHost: sts.eu-central-1.amazonaws.com\r\nUser-Agent: aws-sdk-go-v2/1.22.2 os/linux lang/go#1.22.1 md/GOOS#linux md/GOARCH#arm64 api/sts#1.25.1\r\nContent-Length: 255\r\nAmz-Sdk-Invocation-Id: 491035f5-fcdd-42bf-8216-d3f005215549\r\nAmz-Sdk-Request: attempt=1; max=5\r\nAuthorization: AWS4-HMAC-SHA256 Credential=<ACCESS_KEY_ID>/20240320/eu-central-1/sts/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;x-amz-date, Signature=19eb0754581c180563ced9d324eea56f7f969e16a416db91ed89afa6c1225ea2\r\nContent-Type: application/x-www-form-urlencoded\r\nX-Amz-Date: 20240320T201534Z\r\nAccept-Encoding: gzip\r\n\r\nAction=AssumeRole&DurationSeconds=900&ExternalId=45b63681-cc17-4daa-9696-514f6ea47a00&RoleArn=arn%3Aaws%3Aiam%3A%3A487609081575%3Arole%2Fcast-eks-core-e2e-eks-20240116-cluster-role-45b63681&RoleSessionName=aws-go-sdk-1710965734836842112&Version=2011-06-15" cluster_id=45b63681-cc17-4daa-9696-514f6ea47a00 instance_id=external-provisioner-worker-78f9fbbc56-8psw6 level_int=5 pool=cluster provider_type=eks reconcile_id=40c55e9a-2bd2-4354-88de-208a740cdded |   |   | Fieldsalert_groupconsole/external-provisioner-workerappexternal-provisioner-workercontainerexternal-provisioner-workerfilename/var/log/pods/console_external-provisioner-worker-78f9fbbc56-8psw6_728a61c3-b2ec-4ae7-bc52-03eec3ccfd45/external-provisioner-worker/1.logjobconsole/external-provisioner-workerleveldebugnamespaceconsolenode_nametilt-control-planepodexternal-provisioner-worker-78f9fbbc56-8psw6teamkube | Fields |   | alert_group | console/external-provisioner-worker |   | app | external-provisioner-worker |   | container | external-provisioner-worker |   | filename | /var/log/pods/console_external-provisioner-worker-78f9fbbc56-8psw6_728a61c3-b2ec-4ae7-bc52-03eec3ccfd45/external-provisioner-worker/1.log |   | job | console/external-provisioner-worker |   | level | debug |   | namespace | console |   | node_name | tilt-control-plane |   | pod | external-provisioner-worker-78f9fbbc56-8psw6 |   | team | kube
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
 
2024-03-20T20:15:35.013288788Z stdout F time="2024-03-20T20:15:35Z" level=debug msg="Response\nHTTP/1.1 200 OK\r\nContent-Length: 1539\r\nContent-Type: text/xml\r\nDate: Wed, 20 Mar 2024 20:15:34 GMT\r\nX-Amzn-Requestid: 7795857e-d350-48e5-bc8f-35c70c70ba13\r\n\r\n<AssumeRoleResponse xmlns=\"https://sts.amazonaws.com/doc/2011-06-15/\">\n  <AssumeRoleResult>\n    <AssumedRoleUser>\n      <AssumedRoleId>AROAXDB6ED3T532E7MY5Q:aws-go-sdk-1710965734836842112</AssumedRoleId>\n      <Arn>arn:aws:sts::487609081575:assumed-role/cast-eks-core-e2e-eks-20240116-cluster-role-45b63681/aws-go-sdk-1710965734836842112</Arn>\n    </AssumedRoleUser>\n    <Credentials>\n      <AccessKeyId>ASIAXDB6ED3TXBBW77JD</AccessKeyId>\n      <SecretAccessKey>IbHTTYtP4Q4b+nd3HtbsTbVzWyYnzSd1miYRXYuq</SecretAccessKey>\n      <SessionToken>IQoJb3JpZ2luX2VjEAQaDGV1LWNlbnRyYWwtMSJHMEUCIQDN1y/i302nM3K700AQRJYx8u8YKfYNX7ud8CAnydCHygIgZzCsBXXzKlAhsxKgqHmapOT1VmvDDgNspphRQN4RotUqqwIIHRACGgw0ODc2MDkwODE1NzUiDLlz//92BlZ26ahhgCqIAkff58uhjnUX6Pw2viSIDk/QOIaKWMjwKLlWu6whqb3OuPEvI4/ClBUYZ2oLCrk4Hm3+ynXP+EheELXLUX4KimtyrzY6IbQorO+ZocbeEOAi6xM/nVMt4/NcjVUnYPypMFqLazzvDG1841kgIyr3Wg9yf6mBtxPrwqDgThqwj3saKdC9S9HkqrqfUcJlSVnSlNMgCwKRoLsvAl+n3gxIJ8DeAhG666XtqzGqRtH2C9GY30Og9HRZjzClU4j1PHLol67yU8g4rjLdgpon4PVnjTr47ab2/8hjHmOOdwoKY1kMiZ4dtxBvktlZqUqEK9ujgUWUTN0F4zxL28xEJb4CdpjNeTn+5LfLKjDmh+2vBjqdASqd4pOwR1oUsZpYD3JNd8Un4xF20au+4ScLAaXyxTjBH9spcItN2SliNsxZ4M4j0lRmavnoEokpBc4RzE9YKX0zgn14WV1j7jUhof7k/YrngB0HX6pKMxPTC4culj3QaD2w8xfYJ07bsVIlrbIh9s120sW6STeMb1KU1u+6QGR2zuEDmRwOVrGo8jIgOLN0jnSmcyA66bBy/zf5MMg=</SessionToken>\n      <Expiration>2024-03-20T20:30:34Z</Expiration>\n    </Credentials>\n  </AssumeRoleResult>\n  <ResponseMetadata>\n    <RequestId>7795857e-d350-48e5-bc8f-35c70c70ba13</RequestId>\n  </ResponseMetadata>\n</AssumeRoleResponse>\n" cluster_id=45b63681-cc17-4daa-9696-514f6ea47a00 instance_id=external-provisioner-worker-78f9fbbc56-8psw6 level_int=5 pool=cluster provider_type=eks reconcile_id=40c55e9a-2bd2-4354-88de-208a740cdded |  
  |   | 2024-03-20 21:15:35.025 | 2024-03-20T20:15:34.838358412Z stdout F time="2024-03-20T20:15:34Z" level=debug msg="Request\nPOST / HTTP/1.1\r\nHost: sts.eu-central-1.amazonaws.com\r\nUser-Agent: aws-sdk-go-v2/1.22.2 os/linux lang/go#1.22.1 md/GOOS#linux md/GOARCH#arm64 api/sts#1.25.1\r\nContent-Length: 255\r\nAmz-Sdk-Invocation-Id: 491035f5-fcdd-42bf-8216-d3f005215549\r\nAmz-Sdk-Request: attempt=1; max=5\r\nAuthorization: AWS4-HMAC-SHA256 Credential=AKIAQNCLJGYSFTG3XZFO/20240320/eu-central-1/sts/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;x-amz-date, Signature=19eb0754581c180563ced9d324eea56f7f969e16a416db91ed89afa6c1225ea2\r\nContent-Type: application/x-www-form-urlencoded\r\nX-Amz-Date: 20240320T201534Z\r\nAccept-Encoding: gzip\r\n\r\nAction=AssumeRole&DurationSeconds=900&ExternalId=45b63681-cc17-4daa-9696-514f6ea47a00&RoleArn=arn%3Aaws%3Aiam%3A%3A487609081575%3Arole%2Fcast-eks-core-e2e-eks-20240116-cluster-role-45b63681&RoleSessionName=aws-go-sdk-1710965734836842112&Version=2011-06-15" cluster_id=45b63681-cc17-4daa-9696-514f6ea47a00 instance_id=external-provisioner-worker-78f9fbbc56-8psw6 level_int=5 pool=cluster provider_type=eks reconcile_id=40c55e9a-2bd2-4354-88de-208a740cdded |  
  | Fieldsalert_groupconsole/external-provisioner-workerappexternal-provisioner-workercontainerexternal-provisioner-workerfilename/var/log/pods/console_external-provisioner-worker-78f9fbbc56-8psw6_728a61c3-b2ec-4ae7-bc52-03eec3ccfd45/external-provisioner-worker/1.logjobconsole/external-provisioner-workerleveldebugnamespaceconsolenode_nametilt-control-planepodexternal-provisioner-worker-78f9fbbc56-8psw6teamkube | Fields |   | alert_group | console/external-provisioner-worker |   | app | external-provisioner-worker |   | container | external-provisioner-worker |   | filename | /var/log/pods/console_external-provisioner-worker-78f9fbbc56-8psw6_728a61c3-b2ec-4ae7-bc52-03eec3ccfd45/external-provisioner-worker/1.log |   | job | console/external-provisioner-worker |   | level | debug |   | namespace | console |   | node_name | tilt-control-plane |   | pod | external-provisioner-worker-78f9fbbc56-8psw6 |   | team | kube
Fields
  | alert_group | console/external-provisioner-worker
  | app | external-provisioner-worker
  | container | external-provisioner-worker
  | filename | /var/log/pods/console_external-provisioner-worker-78f9fbbc56-8psw6_728a61c3-b2ec-4ae7-bc52-03eec3ccfd45/external-provisioner-worker/1.log
  | job | console/external-provisioner-worker
  | level | debug
  | namespace | console
  | node_name | tilt-control-plane
  | pod | external-provisioner-worker-78f9fbbc56-8psw6
  | team | kube


@RanVaknin
Copy link
Contributor

Hi @stgleb ,

Thanks for the logs.

From what I can see in both logs provided the Assume role itself succeeds, but I don't see the requests made with those temporary credentials to EC2, which are the one in question.

Are you able to provide those?

Thanks again,
Ran~

@RanVaknin RanVaknin added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Mar 20, 2024
@stgleb
Copy link
Author

stgleb commented Mar 21, 2024

yeah, I've also noticed that. I'll grab then DescribeRegions logs as well.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 22, 2024
@stgleb
Copy link
Author

stgleb commented Mar 22, 2024

@RanVaknin

Request AssumeRole

time="2024-03-22T09:25:47Z" level=debug msg="Request\nPOST / HTTP/1.1\r\nHost: sts.eu-central-1.amazonaws.com\r\nUser-Agent: aws-sdk-go-v2/1.26.0 os/linux lang/go#1.22.1 md/GOOS#linux md/GOARCH#arm64 api/sts#1.28.5\r\nContent-Length: 244\r\nAmz-Sdk-Invocation-Id: 1259f8aa-d3b9-4a9f-b5bd-703a3cfb6f9c\r\nAmz-Sdk-Request: attempt=1; max=5\r\nAuthorization: AWS4-HMAC-SHA256 Credential=AKIAQNCLJGYSAZAAN74C/20240322/eu-central-1/sts/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-length;content-type;host;x-amz-date, Signature=7a7089d8c0d4a6cf4c9647e639e16352500cda5f5e22778800c37a41c038914a\r\nContent-Type: application/x-www-form-urlencoded\r\nX-Amz-Date: 20240322T092547Z\r\nAccept-Encoding: gzip\r\n\r\nAction=AssumeRole&DurationSeconds=900&ExternalId=5c2845a9-2927-4e64-8913-9312fafb9890&RoleArn=arn%3Aaws%3Aiam%3A%3A028075177508%3Arole%2Fcast-eks-gleb-03-22-cluster-role-5c2845a9&RoleSessionName=aws-go-sdk-1711099547698600295&Version=2011-06-15" cluster_id=5c2845a9-2927-4e64-8913-9312fafb9890 level_int=5 provider_type=eks

Response AssumeRole

2024-03-22T09:25:47Z" level=debug msg="Response\nHTTP/1.1 403 Forbidden\r\nContent-Length: 306\r\nContent-Type: text/xml\r\nDate: Fri, 22 Mar 2024 09:25:47 GMT\r\nX-Amzn-Requestid: be861b37-8a64-41d0-b30b-26ce8d68e29a\r\n\r\n<ErrorResponse xmlns=\"https://sts.amazonaws.com/doc/2011-06-15/\">\n  <Error>\n    <Type>Sender</Type>\n    <Code>InvalidClientTokenId</Code>\n    <Message>The security token included in the request is invalid.</Message>\n  </Error>\n  <RequestId>be861b37-8a64-41d0-b30b-26ce8d68e29a</RequestId>\n</ErrorResponse>\n" cluster_id=5c2845a9-2927-4e64-8913-9312fafb9890 level_int=5 provider_type=eks

Describe cluster

time="2024-03-22T09:25:47Z" level=info msg="finished call /externalcluster.v1.ExternalClusterAPI/UpdateCluster with code InvalidArgument" cluster_id=5c2845a9-2927-4e64-8913-9312fafb9890 grpc.time_ms=1287.071 grpc_code=InvalidArgument grpc_error="rpc error: code = InvalidArgument desc = Forbidden" grpc_error_details="[field_violations:{field:\"credentials\" description:\"invalid credentials: missing permissions to access ec2 regions: operation error EC2: DescribeRegions, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: be861b37-8a64-41d0-b30b-26ce8d68e29a, api error InvalidClientTokenId: The security token included in the request is invalid.\"}]" grpc_method=UpdateCluster grpc_origin_path=/v1/kubernetes/external-clusters/5c2845a9-2927-4e64-8913-9312fafb9890 grpc_service=externalcluster.v1.ExternalClusterAPI grpc_start_time="2024-03-22T09:25:46.594104114Z" level_int=4 organization_id=446d0539-1e0f-4753-a26e-93a8704bf633 request_id= span.kind=server token_id=cd775c38-a6d0-4d0b-a0c7-e394c00c2829 trace_id=7c64c1d4815dd3af user_id="auth0|65e6fd5e2bd958f9933e2786"

@lucix-aws
Copy link
Contributor

@stgleb --

I'm having trouble making sense of these request logs. The underlying cause of your originally reported error is a failed call to STS AssumeRole:

operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 9b051175-e056-4572-bbdd-1fa1990e5b99,
api error InvalidClientTokenId: The security token included in the request is invalid.

Your request logs here show a failed AssumeRole, with the error you first reported, against the version you reported broken.

But your log here shows a successful AssumeRole, against the version you reported broken.

Note that the outer DescribeRegions context is largely immaterial. You're hitting an error on credentials retrieval, which would be triggered by calling any operation if credentials had not yet been retrieved or were previously cached and expired.

The only difference I'm seeing between these two requests is the role ARN. So, followup questions--

  1. are you still experiencing this issue?
  2. if yes to 1, is it intermittent or consistent?
  3. if yes to 1, is it only happening with a particular role or roles?

@lucix-aws lucix-aws added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Apr 11, 2024
@stgleb
Copy link
Author

stgleb commented Apr 12, 2024

yes, issue is still there with following error message:

{"message":"Forbidden","fieldViolations":[{"field":"credentials","description":"invalid credentials: missing permissions to access ec2 regions: operation error EC2: DescribeRegions, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: b60b1d74-9467-46a1-8285-0c2ea349adf0, api error InvalidClientTokenId: The security token included in the request is invalid."}]}

we create roles per each run, so it is consistently happens with any role. At appears after upgrading aws-sdk-go-v2 versions.

@lucix-aws
Copy link
Contributor

We have to break this down further. The following code recreates an AssumeRole call matching your workflow (where the call itself is authorized with static credentials) in isolation:

package main

import (
	"context"
	"fmt"

	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/credentials"
	"github.com/aws/aws-sdk-go-v2/credentials/stscreds"
	"github.com/aws/aws-sdk-go-v2/service/sts"
)

const (
	region = "..."

	roleARN = "..."

	akid    = "..."
	secret  = "..."
	session = "..."
)

var externalID = "..." // clusterID in your code

func main() {
	cfg, err := config.LoadDefaultConfig(
		context.Background(),
		config.WithRegion(region),
		config.WithCredentialsProvider(
			credentials.NewStaticCredentialsProvider(akid, secret, session),
		),
	)
	if err != nil {
		panic(err)
	}

	sts := sts.NewFromConfig(cfg)
	provider := stscreds.NewAssumeRoleProvider(sts, roleARN, func(o *stscreds.AssumeRoleOptions) {
		o.ExternalID = &externalID
	})

	creds, err := provider.Retrieve(context.Background())
	if err != nil {
		panic(err)
	}

	fmt.Println(creds)
}

Do the following:

  1. fill in values as needed (I know you're dynamically creating roles, so either instrument that in front of this, or do it externally first and then set it here
  2. run this snippet against the reported broken modules
  3. run this snippet against the old reported working modules

and let us know the results

@lucix-aws lucix-aws added needs-reproduction This issue needs reproduction. and removed p2 This is a standard priority issue labels Apr 12, 2024
@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Apr 13, 2024
@lucix-aws lucix-aws added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Apr 15, 2024
@stgleb stgleb closed this as completed Apr 17, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

@fals
Copy link

fals commented Apr 17, 2024

@lucix-aws I can confirm that you introduced somehow breaking changes in your SDK. It doesn't retry InvalidClientTokenId by default anymore, and given eventual consistency, from the time we created the assume-role and its credentials it failed. The way to fix it was to force a custom retrier (as bellow) and enforce backoff policy to make the call works as we expected.

config.WithRetryer(func() aws.Retryer {
	r := retry.NewStandard(func(opts *retry.StandardOptions) {
		opts.MaxAttempts = 5
		opts.MaxBackoff = 10 * time.Second
	})
	return NewCustomRetryer(r)
}),

// custom retrier code
if v.ErrorCode() == "InvalidClientTokenId" {
	return aws.TrueTernary
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-reproduction This issue needs reproduction. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

4 participants