Skip to content

Commit

Permalink
[Filebeat] Enable non-AWS S3 buckets for aws-s3 input (#28234)
Browse files Browse the repository at this point in the history
* Update `aws-s3` input to support non-AWS S3 buckets
  • Loading branch information
legoguy1000 authored Oct 26, 2021
1 parent c4ca765 commit 7fe0e57
Show file tree
Hide file tree
Showing 17 changed files with 428 additions and 77 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -782,6 +782,7 @@ for a few releases. Please use other tools provided by Elastic to fetch data fro
- Add latency config option for aws-cloudwatch input. {pull}28509[28509]
- Added proxy support to threatintel/malwarebazaar. {pull}28533[28533]
- Add `text/csv` decoder to `httpjson` input {pull}28564[28564]
- Update `aws-s3` input to connect to non AWS S3 buckets {issue}28222[28222] {pull}28234[28234]

*Heartbeat*

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,18 @@
# to arrive in the queue before returning.
#sqs.wait_time: 20s

# Bucket ARN used for polling AWS S3 buckets
#bucket_arn: arn:aws:s3:::test-s3-bucket

# Bucket Name used for polling non-AWS S3 buckets
#non_aws_bucket_name: test-s3-bucket

# Configures the AWS S3 API to use path style instead of virtual host style (default)
#path_style: false

# Overrides the `cloud.provider` field for non-AWS S3 buckets. See docs for auto recognized providers.
#provider: minio

#------------------------------ AWS CloudWatch input --------------------------------
# Beta: Config options for AWS CloudWatch input
#- type: aws-cloudwatch
Expand Down
65 changes: 63 additions & 2 deletions x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,28 @@ Listing of the S3 bucket will be polled according the time interval defined by
expand_event_list_from_field: Records
----


The `aws-s3` input can also poll 3rd party S3 compatible services such as the self hosted Minio.
Using non-AWS S3 compatible buckets requires the use of `access_key_id` and `secret_access_key` for authentication.
To specify the S3 bucket name, use the `non_aws_bucket_name` config and the `endpoint` must be set to replace the default API endpoint.
`endpoint` should be a full URI in the form of `https(s)://<s3 endpoint>`, that will be used as the API endpoint of the service, or a single domain.
If a domain is provided, the full endpoint URI will be constructed with the region name in the standard form of `https://s3.<region>.<domain>` supported by AWS and several 3rd party providers.
No `endpoint` is needed if using the native AWS S3 service hosted at `amazonaws.com`.
Please see <<aws-credentials-config,Configuration parameters>> for alternate AWS domains that require a different endpoint.

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: aws-s3
non_aws_bucket_name: test-s3-bucket
number_of_workers: 5
bucket_list_interval: 300s
access_key_id: xxxxxxx
secret_access_key: xxxxxxx
endpoint: https://s3.example.com:9000
expand_event_list_from_field: Records
----

The `aws-s3` input supports the following configuration options plus the
<<{beatname_lc}-input-{type}-common-options>> described later.

Expand Down Expand Up @@ -236,7 +258,7 @@ configuring multiline options.
[float]
==== `queue_url`

URL of the AWS SQS queue that messages will be received from. (Required when `bucket_arn` is not set).
URL of the AWS SQS queue that messages will be received from. (Required when `bucket_arn` and `non_aws_bucket_name` are not set).

[float]
==== `visibility_timeout`
Expand Down Expand Up @@ -270,7 +292,12 @@ value is `20s`.
[float]
==== `bucket_arn`

ARN of the AWS S3 bucket that will be polled for list operation. (Required when `queue_url` is not set).
ARN of the AWS S3 bucket that will be polled for list operation. (Required when `queue_url` and `non_aws_bucket_name` are not set).

[float]
==== `non_aws_bucket_name`

Name of the S3 bucket that will be polled for list operation. Required for 3rd party S3 compatible services. (Required when `queue_url` and `bucket_arn` are not set).

[float]
==== `bucket_list_interval`
Expand All @@ -288,6 +315,40 @@ Prefix to apply for the list request to the S3 bucket. Default empty.
Number of workers that will process the S3 objects listed. (Required when `bucket_arn` is set).


[float]
==== `provider`

Name of the 3rd party S3 bucket provider like backblaze or GCP.
The following endpoints/providers will be detected automatically:
|===
|Domain |Provider
|amazonaws.com, amazonaws.com.cn, c2s.sgov.gov, c2s.ic.gov |aws
|backblazeb2.com |backblaze
|wasabisys.com |wasabi
|digitaloceanspaces.com |digitalocean
|dream.io |dreamhost
|scw.cloud |scaleway
|googleapis.com |gcp
|cloud.it |arubacloud
|linodeobjects.com |linode
|vultrobjects.com |vultr
|appdomain.cloud |ibm
|aliyuncs.com |alibaba
|oraclecloud.com |oracle
|exo.io |exoscale
|upcloudobjects.com |upcloud
|ilandcloud.com |iland
|zadarazios.com |zadara
|===


[float]
==== `path_style`

Enabling this option sets the bucket name as a path in the API call instead of a subdomain. When enabled
https://<bucket-name>.s3.<region>.<provider>.com becomes https://s3.<region>.<provider>.com/<bucket-name>.
This is only supported with 3rd party S3 providers. AWS does not support path style.


[float]
==== `aws credentials`
Expand Down
12 changes: 12 additions & 0 deletions x-pack/filebeat/filebeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3045,6 +3045,18 @@ filebeat.inputs:
# to arrive in the queue before returning.
#sqs.wait_time: 20s

# Bucket ARN used for polling AWS S3 buckets
#bucket_arn: arn:aws:s3:::test-s3-bucket

# Bucket Name used for polling non-AWS S3 buckets
#non_aws_bucket_name: test-s3-bucket

# Configures the AWS S3 API to use path style instead of virtual host style (default)
#path_style: false

# Overrides the `cloud.provider` field for non-AWS S3 buckets. See docs for auto recognized providers.
#provider: minio

#------------------------------ AWS CloudWatch input --------------------------------
# Beta: Config options for AWS CloudWatch input
#- type: aws-cloudwatch
Expand Down
38 changes: 29 additions & 9 deletions x-pack/filebeat/input/awss3/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
package awss3

import (
"errors"
"fmt"
"time"

Expand All @@ -28,12 +29,15 @@ type config struct {
MaxNumberOfMessages int `config:"max_number_of_messages"`
QueueURL string `config:"queue_url"`
BucketARN string `config:"bucket_arn"`
NonAWSBucketName string `config:"non_aws_bucket_name"`
BucketListInterval time.Duration `config:"bucket_list_interval"`
BucketListPrefix string `config:"bucket_list_prefix"`
NumberOfWorkers int `config:"number_of_workers"`
AWSConfig awscommon.ConfigAWS `config:",inline"`
FileSelectors []fileSelectorConfig `config:"file_selectors"`
ReaderConfig readerConfig `config:",inline"` // Reader options to apply when no file_selectors are used.
PathStyle bool `config:"path_style"`
ProviderOverride string `config:"provider"`
}

func defaultConfig() config {
Expand All @@ -46,27 +50,33 @@ func defaultConfig() config {
SQSMaxReceiveCount: 5,
FIPSEnabled: false,
MaxNumberOfMessages: 5,
PathStyle: false,
}
c.ReaderConfig.InitDefaults()
return c
}

func (c *config) Validate() error {
if c.QueueURL == "" && c.BucketARN == "" {
logp.NewLogger(inputName).Warnf("neither queue_url nor bucket_arn were provided, input %s will stop", inputName)
return nil
configs := []bool{c.QueueURL != "", c.BucketARN != "", c.NonAWSBucketName != ""}
enabled := []bool{}
for i := range configs {
if configs[i] {
enabled = append(enabled, configs[i])
}
}

if c.QueueURL != "" && c.BucketARN != "" {
return fmt.Errorf("queue_url <%v> and bucket_arn <%v> "+
"cannot be set at the same time", c.QueueURL, c.BucketARN)
if len(enabled) == 0 {
logp.NewLogger(inputName).Warnf("neither queue_url, bucket_arn, non_aws_bucket_name were provided, input %s will stop", inputName)
return nil
} else if len(enabled) > 1 {
return fmt.Errorf("queue_url <%v>, bucket_arn <%v>, non_aws_bucket_name <%v> "+
"cannot be set at the same time", c.QueueURL, c.BucketARN, c.NonAWSBucketName)
}

if c.BucketARN != "" && c.BucketListInterval <= 0 {
if (c.BucketARN != "" || c.NonAWSBucketName != "") && c.BucketListInterval <= 0 {
return fmt.Errorf("bucket_list_interval <%v> must be greater than 0", c.BucketListInterval)
}

if c.BucketARN != "" && c.NumberOfWorkers <= 0 {
if (c.BucketARN != "" || c.NonAWSBucketName != "") && c.NumberOfWorkers <= 0 {
return fmt.Errorf("number_of_workers <%v> must be greater than 0", c.NumberOfWorkers)
}

Expand All @@ -90,6 +100,16 @@ func (c *config) Validate() error {
c.APITimeout, c.SQSWaitTime)
}

if c.FIPSEnabled && c.NonAWSBucketName != "" {
return errors.New("fips_enabled cannot be used with a non-AWS S3 bucket.")
}
if c.PathStyle && c.NonAWSBucketName == "" {
return errors.New("path_style can only be used when polling non-AWS S3 services")
}
if c.ProviderOverride != "" && c.NonAWSBucketName == "" {
return errors.New("provider can only be overriden when polling non-AWS S3 services")
}

return nil
}

Expand Down
Loading

0 comments on commit 7fe0e57

Please sign in to comment.