Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting endpoint and bucket does not work if they are different domains #2883

Closed
2 of 3 tasks
deitch opened this issue Nov 1, 2024 · 8 comments
Closed
2 of 3 tasks
Assignees
Labels
guidance Question that needs advice or information. p3 This is a minor priority issue

Comments

@deitch
Copy link

deitch commented Nov 1, 2024

Acknowledgements

Describe the bug

Create a proxy or local S3-compatible server. Run it at localhost:8080. You would expect that the endpoint is not part of the name. Then try to do activities against the bucket named bucket1. The Host header in the request always includes the bucket name and the endpoint.

For example, PutObject for myfile against bucket bucket1.mydomain.com with endpoint localhost:9000 should have headers:

PUT /myfile?x-id=PutObject HTTP/1.1
Host: bucket1.mydomain.com

Yet it actually has

PUT /myfile?x-id=PutObject HTTP/1.1
Host: bucket1.mydomain.com.localhost:9000

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

The endpoint would not be part of the Host, since that is a pointer as to where to find the fully named bucket.

Current Behavior

Includes it in the host. See the bug description

Reproduction Steps

	var (
		opts   []func(*config.LoadOptions) error // global client options
		s3opts []func(*s3.Options)               // s3 client options
	)
		s3opts = append(s3opts,
			// I tried with each of the following options, both had more or less same result
			//s3.WithEndpointResolverV2(&staticResolver{endpoint: "localhost:9000"}),
			func(o *s3.Options) {
				o.BaseEndpoint = "localhost:9000"
			},
		)
	}
	opts = append(opts, config.WithClientLogMode(aws.LogRequestWithBody|aws.LogResponse))
	opts = append(opts, config.WithRegion(region))
	opts = append(opts, config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
		"myaccesskey"
		"mysecreykey",
		"",
	)))
	cfg, err := config.LoadDefaultConfig(context.TODO(),
		opts...,
	)
	if err != nil {
		return nil, fmt.Errorf("failed to load AWS config: %v", err)
	}

	// Create a new S3 service client
	client := s3.NewFromConfig(cfg, s3opts...)

	uploader := manager.NewUploader(client)

	// Create a file to write the S3 Object contents to.
	f, err := os.Open("source file")
	if err != nil {
		return 0, fmt.Errorf("failed to read input file %q, %v", source, err)
	}
	defer f.Close()

	// Write the contents of the file to the S3 object
	_, err = uploader.Upload(context.TODO(), &s3.PutObjectInput{
		Bucket: aws.String("bucket1.mydomain.com"),
		Key:    aws.String("my file"),
		Body:   f,
	})

Possible Solution

No response

Additional Information/Context

I did try various combinations of BaseEndpoint and EndpointResolverV2 as described in this doc, to no avail.

I suspect there is some combination of which I am not aware, in which case feel free to call this a "docs error report" as opposed to a bug report.

AWS Go SDK V2 Module Versions Used

        github.com/aws/aws-sdk-go-v2 v1.32.3
        github.com/aws/aws-sdk-go-v2/config v1.28.1
        github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.17.35
        github.com/aws/aws-sdk-go-v2/service/s3 v1.66.2
        github.com/aws/aws-sdk-go-v2/credentials v1.17.42
        github.com/aws/aws-sdk-go v1.44.256 // indirect
        github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.6 // indirect
        github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.18 // indirect
        github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.22 // indirect
        github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.22 // indirect
        github.com/aws/aws-sdk-go-v2/internal/ini v1.8.1 // indirect
        github.com/aws/aws-sdk-go-v2/internal/v4a v1.3.22 // indirect
        github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.0 // indirect
        github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.4.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.18.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/sso v1.24.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.3 // indirect
        github.com/aws/aws-sdk-go-v2/service/sts v1.32.3 // indirect
        github.com/aws/smithy-go v1.22.0

Compiler and Version used

go version go1.23.0

Operating System and version

linux/amd64

@deitch deitch added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 1, 2024
@RanVaknin RanVaknin self-assigned this Nov 1, 2024
@RanVaknin
Copy link
Contributor

Hi @deitch ,

virtual hosted bucket host anatomy is <bucket>.<endpoint>

For example, a bucket "foo" and us-east-1 will result in host: foo.s3.us-east-1.amazonaws.com

In your case, the bucket name is bucket1.mydomain.com and the base endpoint is localhost:9000 which will result in the host being bucket1.mydomain.com.localhost:9000 which is the correct and expected result.

Thanks,
Ran~

@RanVaknin RanVaknin added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p3 This is a minor priority issue guidance Question that needs advice or information. and removed needs-triage This issue or PR still needs to be triaged. bug This issue is a bug. labels Nov 1, 2024
@deitch
Copy link
Author

deitch commented Nov 2, 2024

Hi @RanVaknin thanks for jumping in so quickly; I do rather appreciate it.

For example, a bucket "foo" and us-east-1 will result in host: foo.s3.us-east-1.amazonaws.com

I had always assumed endpoint is distinct from the hostname being served. The same way you can use SNI on certs, etc. "Endpoint" = "go to this IP or FQDN to access the service", while "virtual-path bucket" = "this is the Host field I will put in the headers". They could very well be distinct.

What is the correct way to ask the sdk, "I want you to request the bucket FQDN bucket1.mydomain.com (i.e. that is the Host header), but establish the connection to localhost:9000"? They don't have to be tied together.

If this is something we do not support but would want to, I am game for opening a PR for it, if I can have some proper direction as to where. I would guess an option that says not to append the endpoint to the bucket FQDN?

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 3, 2024
@RanVaknin
Copy link
Contributor

Hi @deitch ,

I had always assumed endpoint is distinct from the hostname being served.

I think this might be a confusion based on the meaning of the word endpoint that is being used here to mean SDK specific thing. Endpoint is where the request is being sent to. In the context of S3, the endpoint will either be formatted with the bucket name prefix the endpoint for virtual hosted buckets (<bucket>.<endpoint>) or the bucket name will be used as a suffix for path style buckets (<endpoint>/<bucket>).

What is the correct way to ask the sdk, "I want you to request the bucket FQDN bucket1.mydomain.com (i.e. that is the Host header), but establish the connection to localhost:9000"? They don't have to be tied together.

They are tied together. The SDK does not have a built in DNS resolver to know that mydomain.com actually points to 127.0.0.1 (localhost)
If you need to route traffic from your custom domain to localhost then it needs to happen from outside the context of the SDK.

If this is the desired outcome:

PUT /myfile?x-id=PutObject HTTP/1.1
Host: bucket1.mydomain.com

the bucket name is bucket1 and the BaseEndpoint is mydomain.com. That would achieve the s3 virtual hosted bucket scheme of <bucket>.<endpoint>

Then to route mydomain.com to localhost:9000 you can edit your system's host file to route traffic from mydomain.com to 127.0.0.1 and then using a reverse proxy routing traffic to port 9000 of your localhost.

I might be missing the point here since this use case is new to me.
If you are using an S3 clone that is running locally, then routing it to localhost:9000 should be enough to test with the SDK.
If your custom domain(mydomain.com) is "live" and fronting an actual S3 bucket then routing mydomain.com should be the correct approach. I'm not sure what is the goal of sending requests to mydomain.com but actually routing traffic to localhost.

Thanks,
Ran~

@RanVaknin RanVaknin added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 4, 2024
@deitch
Copy link
Author

deitch commented Nov 5, 2024

If you are using an S3 clone that is running locally, then routing it to localhost:9000 should be enough to test with the SDK.
If your custom domain(mydomain.com) is "live" and fronting an actual S3 bucket then routing mydomain.com should be the correct approach. I'm not sure what is the goal of sending requests to mydomain.com but actually routing traffic to localhost.

These are both cases I am working with: a local clone (primarily for testing, but not always) and a transparent proxy.

I think this might be a confusion based on the meaning of the word endpoint that is being used here to mean SDK specific thing

This, I believe, is the heart of it. I think you are saying, from the SDK's perspective, "endpoint" means two things:

  1. Routing: The FQDN that the SDK will use to find the server to which to connect (Layer 3/4)
  2. Hostname: The value to place in the Host header, i.e. <bucketname>.<endpoint> (Layer 7)

I can get why the endpoint might mean both, but also why we might want them to be optionally separable.

There is a direct analogy in pkg net/http. On the one hand, if I do http.Get("http://example.com/"), it will use example.com as both the FQDN to use to resolve for Layer 3 and the value to place in the Host header. However, if I want to split the two (which is common), I can use an http.Client, set the Transport property to http.Transport, which has the Dial property:

Dial func(network, addr [string](https://pkg.go.dev/builtin#string)) ([net](https://pkg.go.dev/net).[Conn](https://pkg.go.dev/net#Conn), [error](https://pkg.go.dev/builtin#error))

The resolution of "here is an FQDN" to "here is a net.Conn which the higher-level http.Client can use to create the http connection, sending whatever headers it wants.

As I think about this, if your position is that this may be a valid use case, but should be handled at the http.Client level, like any other case, and that "S3 endpoint" does not mean "control connection endpoint", that would make sense, too. All that would be needed is some clear direction/docs as to how to do that.

Does this explanation help?

@RanVaknin
Copy link
Contributor

Hi @deitch,

Thanks for the additional info.

There is a direct analogy in pkg net/http.

That is because the Go SDK's http client is the Golang standard library http client. The SDK only builds the request and then hands it to the standard library to handle the actual http request.

You can override the SDK's http client to use your desired custom Transport layer with your own implementation of Dial if that is what you are after.

Let me know if this is the piece of info you are after.

Thanks,
Ran~

@RanVaknin RanVaknin removed the guidance Question that needs advice or information. label Nov 5, 2024
@deitch
Copy link
Author

deitch commented Nov 5, 2024

Ah, that's it. So would the following be correct?

-----BEGIN-----
The aws-sdk-v2 endpoint parameter defines the endpoint used for accessing the bucket. This endpoint is used both for resolving the server hostname and port, as well as the Host header in the http connection. If you use virtual-host-style buckets, then the Host header will have the bucket name prepended to the endpoint.

If you wish to override low-level connection, for example to change the timeout or connect to a different server and port, you can do so by changing the http.Client used. The Host header will continue to be the endpoint - for path-style - or bucket.endpoint - for virtual-host style - but the network connection will be constructed via the http.Client that you provide.
-----END-----

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 6, 2024
@RanVaknin RanVaknin added the guidance Question that needs advice or information. label Nov 7, 2024
@RanVaknin
Copy link
Contributor

Hi @deitch ,

Sounds largely correct. Endpoint is not an S3 specific thing though, its a concept you'd use to override the default request building logic that the SDK provides out of the box and can apply to any AWS service.

Seems like the original question was answered (using your own custom transport to define your own implementation of Dial), so I'm going to close this issue.

Thanks for reaching out. If you need anything else please feel free to open a new issue.

All the best,
Ran~

Copy link

github-actions bot commented Nov 7, 2024

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

2 participants