-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-8342. AWS S3 Lifecycle Configurations doc #6589
base: master
Are you sure you want to change the base?
Conversation
@ivandika3 @xichen01 Please take a look and let me know whether I need to amend or enhance it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @mohan3d for the design document. I left a few initial comments. Will review deeper in the following days.
Also, @kerneltime has left some comment in the ticket (https://issues.apache.org/jira/browse/HDDS-8342?focusedCommentId=17841064&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17841064). Could you help to address it as well?
@ArafatKhan2198 @SaketaChalamchala @tanvipenumudy Could you help take a look as well when you have time? |
optional uint64 creationTime = 4; | ||
repeated LifecycleRule rules = 5; | ||
optional uint64 objectID = 6; | ||
optional uint64 updateID = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc @mohan3d. Do you think it would make sense to also add a required status field here to indicate whether the configuration is enabled or disabled?
Consequently, we might need definitions for Disabling and Enabling the lifecycle configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SaketaChalamchala Yes it makes sense, and actually there is such flag on the Rule level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SaketaChalamchala The required bool enabled = 3;
in the LifecycleRule
is use to indicate whether the configuration is enabled or disabled
|
||
message DeleteLifecycleConfigurationRequest { | ||
required string volumeName = 1; | ||
required string bucketName = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would accepting an optional LifecycleFilter here and in List and Info configuration requests be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful. I was not able to find such thing on AWS side that's why my implementation doesn't have such optional filter.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketLifecycleConfiguration.html
https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketLifecycle.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS S3 only supports to delete all the LifecycleConfiguration
of a bucket, refer to:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketLifecycle.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
## Overview | ||
|
||
### Functionality | ||
- User should be able to create/remove/fetch lifecycle configurations for a specific S3 bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any thoughts on what Acl checks would be performed for creating a lifecycle configuration. Would it be restricted to the owners of the keys or an ozone administrator?
What would happen if keys with the same prefix have multiple owners. I one of the key owners creates a liecycle configuration on the prefix would all of the keys with the prefix be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any thoughts on what Acl checks would be performed for creating a lifecycle configuration. Would it be restricted to the owners of the keys or an ozone administrator?
Maybe Need the 'WRITE' permission for the being operated bucket?
If a user has 'WRITE' permission on a bucket, it is possible to overwrite or delete another user's key in the bucket without going through the Lifecycle
When Lifecycle deletes a key, as long as the
Rule
is met, the key will be deleted, if we want to block users from removing or deleting objects from specific bucket, bucket owner should not give theWRITE
permission for the other user.
When Lifecycle deletes a key, as long as the Rule
is met, the key will be deleted, the deleting operation is executed by the om own, the om is a admin
user. if we want to block users from removing or deleting objects from specific bucket, bucket owner should not give the WRITE
permission for the other user on the specific bucket.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleConfiguration.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time of implementing this I had no imagination on how the ACLs will work on this. If I recall it was restricted for the owner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I remember I implemented it this way, if the user who sat the lifecycle configuration doesn't have the right to delete the key, then the key should be skipped although it was eligible for deletion.
But later I changed it to delete the key anyway, I need to check the code to answer you accurately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen when the 'WRITE' permission is revoked from a user. Would that trigger all the Lifecycle configurations owned by the user to be disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said by @mohan3d and @xichen01 , for simplicity sake, we can first restrict it to the management of bucket lifecycle configurations to the bucket owner since the bucket owner will not change for the lifecycle of the bucket.
Note: Bucket lifecycle configurations will need to be deleted before the bucket can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- A background retention service is responsible for scheduling and executing tasks at specified intervals. | ||
- The Retention Manager retrieves lifecycle configurations associated with buckets. | ||
- Then assigns each lifecycle configuration (attached to a bucket) to a threadpool (Configurable) for further processing. | ||
- Each task will iterate through keys of a specific bucket and issue deletion request for eligible keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohan3d , could you elaborate if a key is covered by multiple defined rules, what will be the final operation of this key, if there are conflicts between rules, or there are different expiration conditions between rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS S3 optimizes for cost. Which means whichever the decision to reduce the cost will be applied. In the case you mentioned the shorter expiration will be honored.
Further details: https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html#lifecycle-config-conceptual-ex5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This simply means that any rule that matches will be executed. Currently, the only "action" in our Lifecycle is to delete, so when checking the specified key, if any rule matches, then the key will be deleted.
This is also the rule for AWS S3 Lifecycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's specify the multiple rules conflict resolution explicitly in the design document.
@mohan3d , thanks for working on this. The design documents looks straightforward. Could you fill it with more details? such as the new table format, how to handle the scale thing, the rule example, and what will be the final decision if multiple rules are defined. BTW, I prefer a new table too. |
Thanks @ChenSammi for the review, I am not able to respond to your latest comment so I will do here.
Sure, I will be adding the new table format shortly, and more details on how the retention manager is designed (This should help us understand how it can scale and also a good opportunity to get some thoughts from the community).
This was answered in earlier comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohan3d thanks for the document! I had a couple of questions, you can see the in-line comment.
- The lifecycle configurations will be executed periodically. | ||
- Depending on the rules of the lifecycle configuration there could be different actions or even multiple actions. | ||
- At the moment only expiration is supported (keys get deleted). | ||
- The lifecycle configurations supports all buckets not only S3 buckets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3 supports lifecycle configurations only for non-directory buckets. In that case, translating the same for Ozone, we would be supporting these only for object-store buckets and not for FSO buckets.
Do we plan to do as above or consider FSO buckets as well?
Also, how do we plan to handle legacy buckets? As we have the ozone.om.enable.filesystem.paths config to have flexibility on bucket behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tejaskriya I didn't think of such case. Maybe @ivandika3 and @xichen01 can help more on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For FSO buckets, due to the directory structure, whether an object can be deleted depends on its subdirectories, so Lifecycle cannot perform as expected.
For the legacy buckets, I think we can support it and there is no need to distinguish between legacy buckets and OBS buckets, because the deletion operations that Lifecycle can perform do not exceed what legacy buckets can do. (But I think support for legacy buckets is a feature that can be discussed)
message LifecycleRule { | ||
optional string id = 1; | ||
optional string prefix = 2; | ||
required bool enabled = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SaketaChalamchala Here is the status flag (Enables or not).
## Overview | ||
|
||
### Functionality | ||
- User should be able to create/remove/fetch lifecycle configurations for a specific S3 bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time of implementing this I had no imagination on how the ACLs will work on this. If I recall it was restricted for the owner.
|
||
message DeleteLifecycleConfigurationRequest { | ||
required string volumeName = 1; | ||
required string bucketName = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful. I was not able to find such thing on AWS side that's why my implementation doesn't have such optional filter.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketLifecycleConfiguration.html
https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketLifecycle.html
- A background retention service is responsible for scheduling and executing tasks at specified intervals. | ||
- The Retention Manager retrieves lifecycle configurations associated with buckets. | ||
- Then assigns each lifecycle configuration (attached to a bucket) to a threadpool (Configurable) for further processing. | ||
- Each task will iterate through keys of a specific bucket and issue deletion request for eligible keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS S3 optimizes for cost. Which means whichever the decision to reduce the cost will be applied. In the case you mentioned the shorter expiration will be honored.
Further details: https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html#lifecycle-config-conceptual-ex5
## Overview | ||
|
||
### Functionality | ||
- User should be able to create/remove/fetch lifecycle configurations for a specific S3 bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I remember I implemented it this way, if the user who sat the lifecycle configuration doesn't have the right to delete the key, then the key should be skipped although it was eligible for deletion.
But later I changed it to delete the key anyway, I need to check the code to answer you accurately.
- The lifecycle configurations will be executed periodically. | ||
- Depending on the rules of the lifecycle configuration there could be different actions or even multiple actions. | ||
- At the moment only expiration is supported (keys get deleted). | ||
- The lifecycle configurations supports all buckets not only S3 buckets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tejaskriya I didn't think of such case. Maybe @ivandika3 and @xichen01 can help more on this.
@ChenSammi @SaketaChalamchala @xichen01 I forgot to submit my comments earlier and it was in the pending status. Please take a look into my comments. |
- **Maximum Rules**: The table can store up to 1000 rules per lifecycle configuration. | ||
- **Validation**: The configuration is considered valid if: | ||
- The `volume`, `bucket`, and `owner` are not blank. | ||
- The number of rules is between 1 and 1000. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan on introducing a property for setting the number of rules to be stored per lifecycle configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, each lifecycle configuration will have a list of rules. Hence there is no need to explicitly maintain the count of rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can limit it to 1000 rules per lifecycle configuration as per AWS documentation (https://docs.aws.amazon.com/AmazonS3/latest/userguide/intro-lifecycle-rules.html#intro-lifecycle-rule-id).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thank you for clarifying @ivandika3 and @mohan3d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohan3d Please help to add this in the design docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivandika3 you mean the implementation details on how it will be limited to 1000 rules or the limit itself? the limit has been added earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I saw that it's been specified. Probably better to add the link the actual AWS documentation so the readers do not misconstrue it as an arbitrary constraints.
The other AWS docs links can also be included in other parts of the design documents (e.g. conflicting rules).
Yes, If we restrict that only the bucket owner can set a lifecycle, we can circumvent the permission issue when lifecycle-service delete keys, because the bucket owner have ALL Permission for the bucket resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments from the community meeting. Please kindly help to address them when you have time.
- Need to specify the permission requirements of AWS lifecycle configuration for both Native ACL and Ranger
- For example, since Ranger does not seem to have a concept of "Bucket owner", what permission should the Ranger user need to be able to create and delete lifecycle configuration as well as what permission does the Ranger user need to be able to delete the keys.
- Please help to provide more scenarios of conflicting rules (e.g. rule with root expire on 7 days and rule in subdirectory expire on 14 days, the keys under subdirectory will be deleted).
- We can take some from the AWS documentation
- @xichen01 Could you also help to add the conflict resolution rules for tag, prefix, etc?
What changes were proposed in this pull request?
Design proposal for data retention (AWS S3 Lifecycle Configurations ) feature. Please comment inline on the markdown document to ask questions and post feedback.
What is the link to the Apache JIRA
HDDS-8342
How was this patch tested?
N/A