Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use manager.GetBucketRegion() to retrieve S3 bucket region instead of GetBucketLocation #2082

Merged
merged 5 commits into from
May 28, 2024

Conversation

pdecat
Copy link
Contributor

@pdecat pdecat commented Feb 20, 2024

This PR replaces usage of GetBucketLocation REST API calls by plain HTTP HEAD requests to retrieve S3 bucket region.

Resolves #1586

Note: this based on the branch of #2080 to allow running the tests/aws_s3_bucket/ test.

@pdecat pdecat changed the title feat/use s3 http head feat: use HTTP HEAD method to retrieve S3 bucket region instead of GetBucketLocation Feb 20, 2024
@pdecat
Copy link
Contributor Author

pdecat commented Feb 20, 2024

Test results:

# node tint.js tests/aws_s3_bucket/
No env file present for the current environment:  staging
 Falling back to .env config
No env file present for the current environment:  staging
customEnv TURBOT_TEST_EXPECTED_TIMEOUT undefined

SETUP: tests/aws_s3_bucket []

PRETEST: tests/aws_s3_bucket

TEST: tests/aws_s3_bucket
Running terraform
data.aws_region.alternate: Reading...
data.aws_region.alternate: Read complete after 0s [id=us-east-1]
data.aws_canonical_user_id.current_user: Reading...
data.aws_region.primary: Reading...
data.aws_caller_identity.current: Reading...
data.aws_partition.current: Reading...
data.aws_region.primary: Read complete after 0s [id=us-east-2]
data.aws_partition.current: Read complete after 0s [id=aws]
data.aws_caller_identity.current: Read complete after 0s [id=012345678901]
data.null_data_source.resource: Reading...
data.null_data_source.resource: Read complete after 0s [id=static]
data.aws_canonical_user_id.current_user: Read complete after 1s [id=012345678901012345678901012345678901]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_kms_key.mykey will be created
  + resource "aws_kms_key" "mykey" {
      + arn                                = (known after apply)
      + bypass_policy_lockout_safety_check = false
      + customer_master_key_spec           = "SYMMETRIC_DEFAULT"
      + deletion_window_in_days            = 10
      + description                        = "This key is used to encrypt bucket objects"
      + enable_key_rotation                = false
      + id                                 = (known after apply)
      + is_enabled                         = true
      + key_id                             = (known after apply)
      + key_usage                          = "ENCRYPT_DECRYPT"
      + multi_region                       = (known after apply)
      + policy                             = (known after apply)
      + tags_all                           = (known after apply)
    }

  # aws_s3_bucket.named_test_resource will be created
  + resource "aws_s3_bucket" "named_test_resource" {
      + acceleration_status         = (known after apply)
      + acl                         = (known after apply)
      + arn                         = (known after apply)
      + bucket                      = "turbottest46000"
      + bucket_domain_name          = (known after apply)
      + bucket_prefix               = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + object_lock_enabled         = (known after apply)
      + policy                      = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + tags                        = {
          + "name" = "turbottest46000"
        }
      + tags_all                    = {
          + "name" = "turbottest46000"
        }
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)

      + cors_rule {
          + allowed_headers = [
              + "*",
            ]
          + allowed_methods = [
              + "PUT",
              + "POST",
            ]
          + allowed_origins = [
              + "https://s3-website-test.hashicorp.com",
            ]
          + expose_headers  = [
              + "ETag",
            ]
          + max_age_seconds = 3000
        }

      + lifecycle_rule {
          + enabled = true
          + id      = "log"
          + prefix  = "log/"
          + tags    = {
              + "autoclean" = "true"
              + "rule"      = "log"
            }

          + expiration {
              + days = 90
            }

          + transition {
              + days          = 30
              + storage_class = "STANDARD_IA"
            }
          + transition {
              + days          = 60
              + storage_class = "GLACIER"
            }
        }
      + lifecycle_rule {
          + enabled = true
          + id      = "tmp"
          + prefix  = "tmp/"

          + expiration {
              + date = "2022-01-12"
            }
        }

      + object_lock_configuration {
          + object_lock_enabled = "Enabled"
        }

      + versioning {
          + enabled    = true
          + mfa_delete = false
        }
    }

  # aws_s3_bucket_acl.named_test_resource will be created
  + resource "aws_s3_bucket_acl" "named_test_resource" {
      + bucket = (known after apply)
      + id     = (known after apply)

      + access_control_policy {
          + grant {
              + permission = "FULL_CONTROL"

              + grantee {
                  + display_name = (known after apply)
                  + id           = "012345678901012345678901012345678901"
                  + type         = "CanonicalUser"
                }
            }
          + owner {
              + display_name = (known after apply)
              + id           = "012345678901012345678901012345678901"
            }
        }
    }

  # aws_s3_bucket_ownership_controls.named_test_resource will be created
  + resource "aws_s3_bucket_ownership_controls" "named_test_resource" {
      + bucket = (known after apply)
      + id     = (known after apply)

      + rule {
          + object_ownership = "BucketOwnerPreferred"
        }
    }

  # aws_s3_bucket_policy.b will be created
  + resource "aws_s3_bucket_policy" "b" {
      + bucket = (known after apply)
      + id     = (known after apply)
      + policy = (known after apply)
    }

Plan: 5 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + account_id        = "012345678901"
  + aws_partition     = "aws"
  + canonical_user_id = "012345678901012345678901012345678901"
  + kms_key_id        = (known after apply)
  + resource_aka      = (known after apply)
  + resource_name     = "turbottest46000"
aws_kms_key.mykey: Creating...
aws_s3_bucket.named_test_resource: Creating...
aws_kms_key.mykey: Creation complete after 1s [id=0aa7048d-a3ac-4a97-afe8-b552b1af2b18]
aws_s3_bucket.named_test_resource: Creation complete after 4s [id=turbottest46000]
aws_s3_bucket_ownership_controls.named_test_resource: Creating...
aws_s3_bucket_policy.b: Creating...
aws_s3_bucket_ownership_controls.named_test_resource: Creation complete after 0s [id=turbottest46000]
aws_s3_bucket_acl.named_test_resource: Creating...
aws_s3_bucket_policy.b: Creation complete after 1s [id=turbottest46000]
aws_s3_bucket_acl.named_test_resource: Creation complete after 1s [id=turbottest46000]

Warning: Deprecated

  with data.null_data_source.resource,
  on variables.tf line 48, in data "null_data_source" "resource":
  48: data "null_data_source" "resource" {

The null_data_source was historically used to construct intermediate values
to re-use elsewhere in configuration, the same can now be achieved using
locals or the terraform_data resource type in Terraform 1.4 and later.

(and one more similar warning elsewhere)

Warning: Argument is deprecated

  with aws_s3_bucket.named_test_resource,
  on variables.tf line 59, in resource "aws_s3_bucket" "named_test_resource":
  59: resource "aws_s3_bucket" "named_test_resource" {

Use the aws_s3_bucket_lifecycle_configuration resource instead

(and 14 more similar warnings elsewhere)

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

account_id = "012345678901"
aws_partition = "aws"
canonical_user_id = "012345678901012345678901012345678901"
kms_key_id = "arn:aws:kms:us-east-2:012345678901:key/0aa7048d-a3ac-4a97-afe8-b552b1af2b18"
resource_aka = "arn:aws:s3:::turbottest46000"
resource_name = "turbottest46000"

Running SQL query: test-get-query.sql
[
  {
    "akas": [
      "arn:aws:s3:::turbottest46000"
    ]
  }
]
✔ PASSED

Running SQL query: test-hydrate-query.sql
[
  {
    "acl": {
      "Grants": [
        {
          "Grantee": {
            "DisplayName": null,
            "EmailAddress": null,
            "ID": "012345678901012345678901012345678901",
            "Type": "CanonicalUser",
            "URI": null
          },
          "Permission": "FULL_CONTROL"
        }
      ],
      "Owner": {
        "DisplayName": null,
        "ID": "012345678901012345678901012345678901"
      }
    },
    "bucket_policy_is_public": false,
    "logging": null,
    "name": "turbottest46000",
    "object_lock_configuration": {
      "ObjectLockEnabled": "Enabled",
      "Rule": null
    },
    "region": "us-east-2",
    "replication": null,
    "versioning_enabled": true,
    "versioning_mfa_delete": false
  }
]
✔ PASSED

Running SQL query: test-list-query.sql
[
  {
    "akas": [
      "arn:aws:s3:::turbottest46000"
    ],
    "bucket_policy_is_public": false,
    "logging": null,
    "name": "turbottest46000",
    "partition": "aws",
    "tags": {
      "name": "turbottest46000"
    },
    "tags_src": [
      {
        "Key": "name",
        "Value": "turbottest46000"
      }
    ],
    "title": "turbottest46000",
    "versioning_enabled": true,
    "versioning_mfa_delete": false
  }
]
✔ PASSED

POSTTEST: tests/aws_s3_bucket

TEARDOWN: tests/aws_s3_bucket

SUMMARY:

1/1 passed.

@pdecat pdecat force-pushed the feat/use_s3_http_head branch 2 times, most recently from 1faf018 to b0bf89e Compare February 20, 2024 17:59
aws/table_aws_s3_bucket.go Outdated Show resolved Hide resolved
@cbruno10
Copy link
Contributor

Hey @pdecat , thanks for raising this PR! We have it on our radar to review, along with #2080.

@pdecat pdecat force-pushed the feat/use_s3_http_head branch 2 times, most recently from dc1ab15 to b1bf791 Compare March 1, 2024 08:18
@pdecat pdecat force-pushed the feat/use_s3_http_head branch 2 times, most recently from 3398a5d to 71cde92 Compare March 27, 2024 06:20
@pdecat
Copy link
Contributor Author

pdecat commented Mar 27, 2024

Rebased on main following merge of #2080

@pdecat pdecat force-pushed the feat/use_s3_http_head branch 4 times, most recently from 9460998 to 2f640e9 Compare March 27, 2024 06:34
@pdecat pdecat force-pushed the feat/use_s3_http_head branch 2 times, most recently from 8825d42 to c310fd8 Compare April 8, 2024 10:10
Copy link
Contributor

@cbruno10 cbruno10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pdecat , sorry for the long delay in reviewing!

I've added some initial review comments/questions, can you please take a look? Thanks!

// Not doing so with such buckets causes `tls: failed to verify certificate: x509: certificate is valid for *.s3.amazonaws.com, s3.amazonaws.com, not www.somedomain.com.s3.amazonaws.com (SQLSTATE HV000)` errors.
// FIXME: do we also want to implement non S3 path style? It may help avoiding rate limiting, but given the above limitation,
// it may be better to define a default Steampipe limiter once actual AWS limits are discovered.
resp, err := http.Head(fmt.Sprintf("https://s3.amazonaws.com/%s", bucket))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdecat I think we need to handle other partitions, like US GovCloud, China, ISO-B, etc., else this request will return a 404 code I believe. The URL could be adjusted based on the commonColumnData.Partition

Copy link
Contributor Author

@pdecat pdecat May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point, and apparently only buckets in regions of the standard AWS partition are exposed using this global endpoint. I do not have access to buckets in GovCloud and ISO partitions to verify this though. Can make some tests against buckets in China, but that won't cover all cases anyway.

Probably a definitive argument to switch to using the manager.GetBucketRegion() as suggested by @vpartington a few weeks ago.

FWIW, it seems the terraform provider does not support the ISO partitions either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdecat We can do some testing against GovCloud buckets within our team, but it seems like switching to manager.GetBucketRegion() would be a good change if that function natively supports other partitions (even if it's not all of them, like ISO).

Would you be able to update this PR to use that function instead? Sorry, I don't want to drag this PR out more (mostly due to my slow response times), but we do have some users in non-commercial partitions at the moment.

If you need any help implementing/testing that change, please let us know, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Testing in progress...

aws/table_aws_s3_bucket.go Outdated Show resolved Hide resolved
return bucketRegion, nil
}

func getBucketRegion(ctx context.Context, d *plugin.QueryData, h *plugin.HydrateData) (interface{}, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of having this separate function vs. combining it with doGetBucketRegion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBucketRegion() gets the bucket name from the hydrate item's Name attribute directly, whereas getBucketRegionForObjects() which also uses doGetBucketRegion() gets it from the bucket_name query data qualifier.

https://github.com/pdecat/steampipe-plugin-aws/blob/d858ce70e3458ce10086bf9ef8a3ea7f43a6be75/aws/table_aws_s3_object.go#L392

aws/table_aws_s3_bucket.go Outdated Show resolved Hide resolved
@cbruno10
Copy link
Contributor

@pdecat Cross-posting here for better visibility - Did you notice any performance improvements, or other benefits?
We had originally opened that issue because AWS API docs said to use HeadBucket instead, but it wasn’t clear if there were any benefits other than moving off of an older API method.

If we need to manually handle what the GetBucketLocation API call does already, like compose URLs for different partitions, I’m wondering if it's beneficial to switch over.

@pdecat
Copy link
Contributor Author

pdecat commented Apr 11, 2024

Hi @cbruno10,

@pdecat Cross-posting here for better visibility - Did you notice any performance improvements, or other benefits? We had originally opened that issue because AWS API docs said to use HeadBucket instead, but it wasn’t clear if there were any benefits other than moving off of an older API method.

If we need to manually handle what the GetBucketLocation API call does already, like compose URLs for different partitions, I’m wondering if it's beneficial to switch over.

I believe I've mentioned this somewhere, probably in Slack, but the main benefit is to avoid the errors that occur if the GetBucketLocation API is invoked on a region endpoint that doesn't match the bucket location. The SDK doesn't follow the non standard header that is returned by the API in this case (not Location).

I'll address all the other comments ASAP.

@cbruno10
Copy link
Contributor

@pdecat Is there a specific query or set of queries where the plugin would call GetBucketLocation on a region other than the bucket's region? I don't know if I've hit this before, but I may not have been running the specific set to hit this use case.

@pdecat
Copy link
Contributor Author

pdecat commented Apr 12, 2024

Found my comment investigating the errors with GetBucketLocation that happen when hitting the wrong region with the AWS Go SDK that the Steampipe AWS plugin is using:
#1586 (comment)

With Steampipe, here's the error message that is logged when AWS_DEFAULT_REGION is set to a wrong region:

rpc error: code = Unknown desc = operation error S3: GetBucketLocation, https response error StatusCode: 403, RequestID: MV4QH9V0Z3YMRZEV, HostID: CLdJF******K4L198mlFM8SWneQ6VE=, api error AccessDenied: Access Denied

In Slack, I've found these comments (copying it as it will soon be inaccessible):

Typically, I can reproduce the issue with the AWS CLI:

# AWS_DEFAULT_REGION=us-east-1 aws --profile my-profile s3api get-bucket-location --bucket my-bucket

An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied

# AWS_DEFAULT_REGION=eu-west-3 aws --profile my-profile s3api get-bucket-location --bucket my-bucket
{
    "LocationConstraint": "eu-west-3"
}

# AWS_DEFAULT_REGION=us-east-1 aws --profile my-profile s3api head-bucket --bucket my-bucket
{
    "BucketRegion": "eu-west-3",
    "AccessPointAlias": false
}

https://turbot-community.slack.com/archives/C044P668806/p1708359173460459?thread_ts=1708357736.103139&cid=C044P668806

I've tried to update the S3 tests to reproduce the issue, but it happens the GetBucketLocation access denied error only reproduces under specific conditions, e.g. if the requester is not the owner of the bucket:

GetBucketLocation requires the requester to be the owner of the bucket.

aws/aws-sdk-go#720 (comment)

The S3 team got back with me and suggested the best API to use is HEAD bucket. GetBucketLocation uses a more complex permissions model where HeadBucket can be called by anyone.

aws/aws-sdk-go#720 (comment)

https://turbot-community.slack.com/archives/C044P668806/p1708367391732849?thread_ts=1708357736.103139&cid=C044P668806

@pdecat
Copy link
Contributor Author

pdecat commented Apr 12, 2024

Here's a related issue mentioning the error I'm facing #1713

@vpartington
Copy link

FYI, instead of using http.Head you can also use the manager.GetBucketRegion function in AWS SDK for Go v2.

@pdecat
Copy link
Contributor Author

pdecat commented Apr 12, 2024

Hi @vpartington,

FYI, instead of using http.Head you can also use the manager.GetBucketRegion function in AWS SDK for Go v2.

Interesting, this helper function does seem to do an unauthenticated HTTP HEAD request too (which is somewhat confusing because there's an actual HeadBucket API and it states All HeadBucket requests must be authenticated and signed by using IAM credentials).

image

And it probably handles other uncommon cases as described in https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/s3#HeadBucketInput.

@pdecat
Copy link
Contributor Author

pdecat commented Apr 12, 2024

Anyway, I've not faced any issue yet since I've switched to using HTTP HEAD two months ago.

And it works even if the bucket is not yours:

# curl -HEAD -i https://s3.amazonaws.com/commoncrawl
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-1
x-amz-request-id: J0AD186X1H9EK40X
x-amz-id-2: OrTMgtyo12JoQRDHJAAEXmBWCb78wpt6XMit/Hv3lw7+LlAKr0ywEL1DtKGHlq733ojIHtJ0kSs=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Fri, 12 Apr 2024 10:17:30 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>J0AD186X1H9EK40X</RequestId><HostId>OrTMgtyo12JoQRDHJAAEXmBWCb78wpt6XMit/Hv3lw7+LlAKr0ywEL1DtKGHlq733ojIHtJ0kSs=</HostId></Error>

Note the x-amz-bucket-region: us-east-1 header.

@vpartington
Copy link

The HeadBucket docs are indeed very confusing. I spent the last few days banging my head against the wall about this. Yesterday I settled on doing a http.Head, just like you do.

Just as I was writing a comment for why I do http.Head in our code base this morning, I found that manager.GetBucketRegion method. It does seem to handle some exotic use cases that I don't care about.

For me having less code to test or maintain is the reason I threw away my code. But then I have not had my code running successfully for a while, so I would also stick with what I've got if I were you.

Just thought I'd reach out because I saw you had struggled with the same problem in this MR.

Take care!

@cbruno10
Copy link
Contributor

@pdecat Sorry for the long response time (again)! I've left a few follow-up questions/suggestions, can you please have a look when you get a chance? Thanks!

@pdecat
Copy link
Contributor Author

pdecat commented May 24, 2024

Hi @cbruno10, I've implemented the requested changes. PTAL :)

@pdecat pdecat changed the title feat: use HTTP HEAD method to retrieve S3 bucket region instead of GetBucketLocation feat: use manager.GetBucketRegion() to retrieve S3 bucket region instead of GetBucketLocation May 27, 2024
@pdecat
Copy link
Contributor Author

pdecat commented May 27, 2024

Can confirm everything also works fine with the use of manager.GetBucketRegion() instead of plain HTTP HEAD requests.

@cbruno10 cbruno10 merged commit 41e34a8 into turbot:main May 28, 2024
1 check passed
@cbruno10
Copy link
Contributor

Thanks so much @pdecat for this PR, and also to @vpartington for sharing info on manager.GetBucketRegion()!

@pdecat pdecat deleted the feat/use_s3_http_head branch May 28, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aws_s3_bucket should use HeadBucket instead of GetBucketLocation
3 participants