Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suport for custom S3 object tagging #690

Merged
merged 5 commits into from
Oct 30, 2023

Conversation

dttung2905
Copy link
Contributor

Problem

Currently, there is a boolean field s3.object.tagging in the Kafka Connect sink to S3 config. If set to true,connector adds s3 object level tags for starting offset, ending offset, and total record count of a given file. Furthermore, we would like to make Kafka Connect sink to S3 able to sink S3 object with configurable tags. This is extremely valuable if users want to manage S3 Object lifecyle based on S3 tags.
Ultimately, this PR solve issue #68

Solution

To add two extra fields s3.object.tagging.key and s3.object.tagging.value for users to specify their own key value pairs. s3.object.tagging must also be set to true as pre-requisite

Does this solution apply anywhere else?
  • yes
  • no
If yes, where?

Test Strategy

Testing done:
  • Unit tests
  • Integration tests
  • System tests
  • Manual tests

Release Plan

@dttung2905 dttung2905 requested a review from a team as a code owner September 30, 2023 17:57
@cla-assistant
Copy link

cla-assistant bot commented Sep 30, 2023

CLA assistant check
All committers have signed the CLA.

Signed-off-by: Dao Thanh Tung <[email protected]>
@dttung2905
Copy link
Contributor Author

CI is failing. I dig up further into the error log and some Integration Tests are failing (link). According to the README file, a env var AWS_CREDENTIALS_PATH should be pointed to the JSON file containing aws access key id and secret. Not too why mine is not set. May be I'm not part of Confluent team 🤔

22:04:59  [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.221 s <<< FAILURE! - in io.confluent.connect.s3.integration.S3SinkDataFormatIT
22:04:59  [ERROR] io.confluent.connect.s3.integration.S3SinkDataFormatIT  Time elapsed: 0.004 s  <<< ERROR!
22:04:59  com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ZD6TWNXK74J0ZQKH; S3 Extended Request ID: IJILLSCNotqVAz41g3cTvF+KipG87hVS3Ta274yy/n5xxup1cGxJP141Yv/VrurG0APHQHWtmbK1BarXXvU+aQ==; Proxy: null)
22:04:59  
22:04:59  [ERROR] io.confluent.connect.s3.integration.S3SinkDataFormatIT  Time elapsed: 0.004 s  <<< ERROR!
22:04:59  com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: ZD6WR5CDG89862EM; S3 Extended Request ID: TKynsQJddQFiJ7pB9m1+XbokNQmS1IpzzUCAEldspuanYf1iRfoLq4DqRn2Po2WVITnlja5WFNsoHMW1qGmHWg==; Proxy: null)
22:04:59  
22:04:59  [INFO] Running io.confluent.connect.s3.integration.S3SinkConnectorIT
22:04:59  [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0 s <<< FAILURE! - in io.confluent.connect.s3.integration.S3SinkConnectorIT
22:04:59  [ERROR] io.confluent.connect.s3.integration.S3SinkConnectorIT  Time elapsed: 0 s  <<< ERROR!
22:04:59  com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ZD6G11ZX2NVNNBHB; S3 Extended Request ID: nnxprPYNTCSVL+hnM0KOgrVezvjbGFsKE0+05kEnDpZ4rkuRovyYgsyeO0yGhNBZNy00efD3t9jaJUPvQN4UWg==; Proxy: null)
22:04:59  
22:04:59  [ERROR] io.confluent.connect.s3.integration.S3SinkConnectorIT  Time elapsed: 0 s  <<< ERROR!
22:04:59  com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: F7ZH2PT5SNX93TJQ; S3 Extended Request ID: WrsUBcfivOJMOCRtxE69GtgoqKr7cMN09/cW35q0TcfoFMWyI3hc/8eBbn/DKFdQ0Fa3rV++AaVgoI/eiWbqCA==; Proxy: null)

Signed-off-by: Dao Thanh Tung <[email protected]>
Signed-off-by: Dao Thanh Tung <[email protected]>
@dttung2905 dttung2905 changed the title WIP: Add suport for custom S3 object tagging Add suport for custom S3 object tagging Oct 6, 2023
@dttung2905
Copy link
Contributor Author

Hi @Enigma25 , could you help to review this PR? I saw that you are one of the most active committers from Confluent in this project 🙏

@pbadani
Copy link
Member

pbadani commented Oct 16, 2023

@dttung2905 Thanks for the PR.
Instead of exposing two configs - one for the keys and another for the values and doing a zip, it would be better to have just one config with key:value pairs.
Something like this.
s3.object.tagging.key.value.pairs= "key1:value1, key2:value2"
The connector can split each entry with the delimiter colon and also add a config validation if needed.

@dttung2905
Copy link
Contributor Author

@dttung2905 Thanks for the PR.

Instead of exposing two configs - one for the keys and another for the values and doing a zip, it would be better to have just one config with key:value pairs.

Something like this.

s3.object.tagging.key.value.pairs= "key1:value1, key2:value2"

The connector can split each entry with the delimiter colon and also add a config validation if needed.

Thank you very much for the feedback. Will try to refactor the code abit to incorporate this change. 🙏

@dttung2905
Copy link
Contributor Author

Hi @pbadani,
I have done making changes from your code review feedback. Could you help to review it again? Thanks alot 🙏

Copy link
Member

@pbadani pbadani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @dttung2905

@dttung2905
Copy link
Contributor Author

Thanks @pbadani, is there any final step I need to do to get this one merged ?

@pbadani
Copy link
Member

pbadani commented Oct 27, 2023

Thanks @pbadani, is there any final step I need to do to get this one merged ?

No, we can just merge it.

@dttung2905
Copy link
Contributor Author

Thanks @pbadani, is there any final step I need to do to get this one merged ?

No, we can just merge it.

@pbadani I dont have the merge button for this PR ( I think due to lack of permission ) could you help me to merge this then ?

@dttung2905
Copy link
Contributor Author

Only those with [write access](https://docs.github.com/articles/what-are-the-different-access-permissions) to this repository can merge pull requests.

This is what I have 😅
image

@pbadani pbadani merged commit d2464ad into confluentinc:master Oct 30, 2023
@dttung2905
Copy link
Contributor Author

Thanks @pbadani for the merge. Could you help to point me to the relevant repo for documentation update? I tried to search in this repo but could not find any place to update the user facing doc like https://docs.confluent.io/kafka-connectors/s3-sink/current/overview.html. I think its an internal repo for Confluent employee only 😄

@dttung2905 dttung2905 deleted the feat-s3-object-tagging branch November 2, 2023 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants