Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 endpoint is defaulting to incorrect value in us-gov-east-1 #5520

Closed
doris-zhou opened this issue Aug 21, 2023 · 11 comments · Fixed by #5540
Closed

S3 endpoint is defaulting to incorrect value in us-gov-east-1 #5520

doris-zhou opened this issue Aug 21, 2023 · 11 comments · Fixed by #5540

Comments

@doris-zhou
Copy link

doris-zhou commented Aug 21, 2023

Describe the bug
I'm running Cortex v1.15.3 on an AWS EKS cluster in the us-gov-east-1 region. I have my blocks storage configured as follows:

blocks_storage:
  backend: s3
  s3:
    bucket_name: <bucket name>
    endpoint: s3-fips.us-gov-east-1.amazonaws.com

The store gateway and other components that reach out to S3 are failing with errors similar to the following:

level=error ts=2023-08-18T21:02:47.217243999Z caller=bucket_client.go:124 msg="bucket operation fail after retries" err="Get \"https://<bucket name>.s3.dualstack.us-east-1.amazonaws.com/?delimiter=%2F&encoding-type=url&fetch-owner=true&list-type=2&prefix=\": dial tcp <redacted>: i/o timeout" operation="Iter "

I have confirmed that the blocks_storage.s3.endpoint value is set correctly to s3-fips.us-gov-east-1.amazonaws.com by using kubectl exec to get into the pod and checking the etc/cortex/cortex.yaml file, but for some reason the pods are actually trying to hit the s3.dualstack.us-east-1.amazonaws.com endpoint instead. Where is this incorrect endpoint coming from?

This seems similar to this mimir issue, but the hop limit is already set to 2 for my cluster.

I have been successfully running Cortex with the same configuration in non-gov regions, so it seems like a problem specific to the AWS GovCloud environment.

I've also tried setting the region to us-gov-east-1, so the configuration is as follows:

blocks_storage:
  backend: s3
  s3:
    bucket_name: <bucket name>
    region: us-gov-east-1
    endpoint: s3-fips.us-gov-east-1.amazonaws.com

It then tries to hit s3.dualstack.us-gov-east-1.amazonaws.com instead, which does work, but it is still reaching out to the incorrect endpoint -- it should be using the FIPS endpoint I specified instead.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex (v1.15.3)
  2. Deploy into an AWS EKS cluster in the us-gov-east-1 region with the configuration specified above

Expected behavior
I expect Cortex to be trying to hit the S3 endpoint set in the configuration (s3-fips.us-gov-east-1.amazonaws.com), not s3.dualstack.us-east-1.amazonaws.com.

Environment:

  • Infrastructure: Kubernetes, AWS EKS
  • Deployment tool: ArgoCD

Additional Context
Cortex is granted access to the S3 bucket through IRSA.

@doris-zhou doris-zhou changed the title Unable to reach S3 in us-gov-east-1 S3 endpoint is defaulting to incorrect value in us-gov-east-1 Aug 21, 2023
@alanprot
Copy link
Member

alanprot commented Aug 21, 2023

This seems a network configuration problem

Can you try to kubectl exec in the pod and see if you can reach the fips s3 endpoint?

@doris-zhou
Copy link
Author

doris-zhou commented Aug 22, 2023

@alanprot Thanks for your response!

I execed into the pod and tried running nslookup, ping, and wget on the FIPS and non-FIPS endpoints, and they all were able to resolve both FIPS and non-FIPS endpoints (getting back server returned error: HTTP/1.1 403 Forbidden from wget, which is expected as I didn't pass in auth), so I believe there should be network connectivity. The only difference is that the FIPS endpoint requires HTTPS where the non-FIPS endpoint does not.

I have a few remaining questions:

  1. Why does Cortex default to s3.dualstack.us-east-1.amazonaws.com instead of trying to hit the endpoint I configured and failing, if it was a network configuration problem? This is very confusing behavior to me.
  2. It appears that minio does not have us-gov-east-1 configured at all in s3utils, only us-gov-west-1: https://github.com/minio/minio-go/blob/master/pkg/s3utils/utils.go#L194
    Could this be related to this problem?

@mehta-ankit
Copy link

One of our team member is trying to get a fix to the minio pkg: minio/minio-go#1879

@alanprot
Copy link
Member

Nice!

We can pull that as soon as it get merged!

@mehta-ankit
Copy link

mehta-ankit commented Aug 23, 2023

One of our team member is trying to get a fix to the minio pkg: minio/minio-go#1879

Update: Maintainers decided to open a new MR with changes that they prefer: minio/minio-go#1880

@mehta-ankit
Copy link

mehta-ankit commented Aug 30, 2023

@alanprot minio-go pkg was released: https://github.com/minio/minio-go/releases/tag/v7.0.63
Is there a possibility to pull it soon ?

@alanprot alanprot mentioned this issue Aug 30, 2023
1 task
@alanprot
Copy link
Member

PR: #5540

@alanprot
Copy link
Member

Can you test the lastest image? @mehta-ankit ?

@mehta-ankit
Copy link

@alanprot
Copy link
Member

Yeah.. that's the one

@doris-zhou
Copy link
Author

@alanprot Tested and it looks like this has resolved the issue. Thanks for your help and quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants