Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-8342. AWS S3 Lifecycle Configurations doc #6589

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
253 changes: 253 additions & 0 deletions hadoop-hdds/docs/content/design/lifecycle-configurations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
---
title: AWS S3 Lifecycle Configurations
summary: Enables users to manage lifecycle configurations for buckets, allowing automated deletion of keys based on predefined rules.
date: 2024-04-25
jira: HDDS-8342
status: draft
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

# Lifecycle Management

## Introduction
I encountered the need for a retention solution within my cluster, specifically the ability to delete keys in specific paths after a certain time period.
This requirement closely resembled the functionality provided by AWS S3 Lifecycle configurations, particularly the Expiration part ([AWS S3 Lifecycle Configuration Examples](https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html)).

## Overview

### Functionality
- User should be able to create/remove/fetch lifecycle configurations for a specific S3 bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any thoughts on what Acl checks would be performed for creating a lifecycle configuration. Would it be restricted to the owners of the keys or an ozone administrator?
What would happen if keys with the same prefix have multiple owners. I one of the key owners creates a liecycle configuration on the prefix would all of the keys with the prefix be deleted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any thoughts on what Acl checks would be performed for creating a lifecycle configuration. Would it be restricted to the owners of the keys or an ozone administrator?

Maybe Need the 'WRITE' permission for the being operated bucket?
If a user has 'WRITE' permission on a bucket, it is possible to overwrite or delete another user's key in the bucket without going through the Lifecycle

When Lifecycle deletes a key, as long as the Rule is met, the key will be deleted, if we want to block users from removing or deleting objects from specific bucket, bucket owner should not give the WRITE permission for the other user.

When Lifecycle deletes a key, as long as the Rule is met, the key will be deleted, the deleting operation is executed by the om own, the om is a admin user. if we want to block users from removing or deleting objects from specific bucket, bucket owner should not give the WRITE permission for the other user on the specific bucket.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleConfiguration.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time of implementing this I had no imagination on how the ACLs will work on this. If I recall it was restricted for the owner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I remember I implemented it this way, if the user who sat the lifecycle configuration doesn't have the right to delete the key, then the key should be skipped although it was eligible for deletion.

But later I changed it to delete the key anyway, I need to check the code to answer you accurately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen when the 'WRITE' permission is revoked from a user. Would that trigger all the Lifecycle configurations owned by the user to be disabled?

Copy link
Contributor

@ivandika3 ivandika3 May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said by @mohan3d and @xichen01 , for simplicity sake, we can first restrict it to the management of bucket lifecycle configurations to the bucket owner since the bucket owner will not change for the lifecycle of the bucket.

Note: Bucket lifecycle configurations will need to be deleted before the bucket can be deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohan3d @xichen01 We need to check about the applicability for Native ACL and Ranger ACL (e.g. whether bucket ownership is applied in Ranger as well). Need to comply to both ACL models.

- The lifecycle configurations will be executed periodically.
- Depending on the rules of the lifecycle configuration there could be different actions or even multiple actions.
- At the moment only expiration is supported (keys get deleted).
- The lifecycle configurations supports all buckets not only S3 buckets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 supports lifecycle configurations only for non-directory buckets. In that case, translating the same for Ozone, we would be supporting these only for object-store buckets and not for FSO buckets.
Do we plan to do as above or consider FSO buckets as well?
Also, how do we plan to handle legacy buckets? As we have the ozone.om.enable.filesystem.paths config to have flexibility on bucket behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tejaskriya I didn't think of such case. Maybe @ivandika3 and @xichen01 can help more on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For FSO buckets, due to the directory structure, whether an object can be deleted depends on its subdirectories, so Lifecycle cannot perform as expected.

For the legacy buckets, I think we can support it and there is no need to distinguish between legacy buckets and OBS buckets, because the deletion operations that Lifecycle can perform do not exceed what legacy buckets can do. (But I think support for legacy buckets is a feature that can be discussed)



### Components

- Lifecycle configurations (will be stored in DB) consists of volumeName, bucketName and a list of rules
- A rule contains prefix (string), Expiration and an optional Filter.
ivandika3 marked this conversation as resolved.
Show resolved Hide resolved
- Object tagging integrations for bucket lifecycle configuration.
- Expiration contains either days (integer) or Date (long)
- Filter contains prefix (string).
- S3G bucket endpoint needs few updates to accept ?/lifecycle
- ClientProtocol and all implementers provides (get, list, delete and create) lifecycle configuration
- RetentionManager:
- Upon startup, the OzoneManager initializes the Retention Manager based on configuration parameters such as retention interval.
- A background retention service is responsible for scheduling and executing tasks at specified intervals.
- The Retention Manager retrieves lifecycle configurations associated with buckets.
- Then assigns each lifecycle configuration (attached to a bucket) to a threadpool (Configurable) for further processing.
- Each task will iterate through keys of a specific bucket and issue deletion request for eligible keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohan3d , could you elaborate if a key is covered by multiple defined rules, what will be the final operation of this key, if there are conflicts between rules, or there are different expiration conditions between rules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS S3 optimizes for cost. Which means whichever the decision to reduce the cost will be applied. In the case you mentioned the shorter expiration will be honored.

Further details: https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html#lifecycle-config-conceptual-ex5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simply means that any rule that matches will be executed. Currently, the only "action" in our Lifecycle is to delete, so when checking the specified key, if any rule matches, then the key will be deleted.
This is also the rule for AWS S3 Lifecycle.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html#lifecycle-config-conceptual-ex5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's specify the multiple rules conflict resolution explicitly in the design document.




### Flow
1. Users interact with lifecycle configurations via S3Gateway.
2. Configuration details are processed by a handler.
3. Configurations are saved/fetched from the database.
4. RetentionManager, running periodically in the Leader OM, executes lifecycle configurations and issues deletions for eligible keys.

## Limitations
- The current solution lacks certain features:
- Only expiration actions are supported.
- Lack of CLI support for managing lifecycle configurations across all buckets (S3G is the only supported entry point).

All these kind of features can be added in the future.

## Protobuf Definitions
```protobuf
/**
S3 lifecycles (filter, expiration, rule and configuration).
*/
message LifecycleFilter {
optional string prefix = 1;
}

message LifecycleExpiration {
optional uint32 days = 1;
optional string date = 2;
}

message LifecycleRule {
optional string id = 1;
optional string prefix = 2;
required bool enabled = 3;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaketaChalamchala Here is the status flag (Enables or not).

optional LifecycleExpiration expiration = 4;
optional LifecycleFilter filter = 5;
}

message LifecycleConfiguration {
required string volume = 1;
required string bucket = 2;
required string owner = 3;
optional uint64 creationTime = 4;
repeated LifecycleRule rules = 5;
optional uint64 objectID = 6;
optional uint64 updateID = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc @mohan3d. Do you think it would make sense to also add a required status field here to indicate whether the configuration is enabled or disabled?
Consequently, we might need definitions for Disabling and Enabling the lifecycle configurations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaketaChalamchala Yes it makes sense, and actually there is such flag on the Rule level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaketaChalamchala The required bool enabled = 3; in the LifecycleRule is use to indicate whether the configuration is enabled or disabled

}

message CreateLifecycleConfigurationRequest {
required LifecycleConfiguration lifecycleConfiguration = 1;
}

message CreateLifecycleConfigurationResponse {

}

message InfoLifecycleConfigurationRequest {
required string volumeName = 1;
required string bucketName = 2;
}

message InfoLifecycleConfigurationResponse {
required LifecycleConfiguration lifecycleConfiguration = 1;
}

message DeleteLifecycleConfigurationRequest {
required string volumeName = 1;
required string bucketName = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would accepting an optional LifecycleFilter here and in List and Info configuration requests be useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful. I was not able to find such thing on AWS side that's why my implementation doesn't have such optional filter.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketLifecycleConfiguration.html
https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketLifecycle.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS S3 only supports to delete all the LifecycleConfiguration of a bucket, refer to:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketLifecycle.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

}

message DeleteLifecycleConfigurationResponse {

}

message ListLifecycleConfigurationsRequest {
optional string userName = 1;
optional string prevKey = 2;
optional uint32 maxKeys = 3;
}

message ListLifecycleConfigurationsResponse {
repeated LifecycleConfiguration lifecycleConfiguration = 1;
}
```

# Proposal

## 1. New Table for Lifecycle Configurations

- Introduce a new table
- Efficient query.
- Requires a new manager (lifecycle manager) and codec.
- No need to alter existing design.
- Update Bucket Deletion to delete linked lifecycle configurations when the bucket is deleted.

## 2. New Field in OmBucketInfo

- Utilize an existing table
- Less efficient query.
- No need for a new manager or codec.
- Update existing design to support lifecycle configurations in OmBucketInfo.
- Updates required for create, get, list, and delete operations in the BucketManager.

## Design Decisions
I made some decisions regarding the design, which require discussion before contribution:
- Lifecycle configurations are stored in their own table in the database, rather than as a field in OmBucketInfo.
- Reasons for this decision:
- Avoid modifying OmBucketInfo table.
- Improve query efficiency for RetentionManager.
- If the alternative approach (storing lifecycle configurations in OmBucketInfo) is preferred, I will eliminate LifecycleConfigurationsManager & the new codec.

## Plan for Contribution
The implementation is substantial and should be split into several merge requests for better review:
1. Basic building blocks (lifecycle configuration, rule, expiration, etc.) and related table creation.
2. ClientProtocol & OzoneManager new operations for managing lifecycle configurations (including protobuf messages).
3. Updates to S3G endpoints.
4. Implementation of the RetentionManager.
5. Merge all changes into a new branch (e.g., 'X'), then merge that branch into master.


# Files Affected

## Implemented Proposal: New Table for Lifecycle Configurations

### hdds-common

- OzoneConfigKeys.java
- OzoneConsts.java

### ozone-client

- ClientProtocol.java
- RpcClient.java
- OzoneLifecycleConfiguration.java

### ozone-common

- OmLCExpiration.java
- OmLCFilter.java
- OmLCRule.java
- OmLifecycleConfiguration.java
- OzoneManagerProtocol.java
- OzoneManagerProtocolClientSideTranslatorPB.java
- OMConfigKeys.java
- OmUtils.java
- TestOmLifeCycleConfiguration.java

### ozone-integration-test

- TestOzoneRpcClientAbstract.java
- TestSecureOzoneRpcClient.java
- TestRetentionManager.java
- TestDataUtil.java

### ozone-interface-client

- OmClientProtocol.proto

### ozone-interface-storage

- OmLifecycleConfigurationCodec.java
- OMMetadataManager.java
- OMDBDefinition.java
- OzoneManagerRatisUtils.java
- OMBucketDeleteRequest.java
- OMLifecycleConfigurationCreateRequest.java
- OMLifecycleConfigurationDeleteRequest.java
- OMBucketDeleteResponse.java
- OMLifecycleConfigurationCreateResponse.java
- OMLifecycleConfigurationDeleteResponse.java
- LCOpAction.java
- LCOpCurrentExpiration.java
- LCOpRule.java
- RetentionManager.java
- RetentionManagerImpl.java
- LifecycleConfigurationManager.java
- LifecycleConfigurationManagerImpl.java
- OmMetadataManagerImpl.java
- OzoneManager.java
- OzoneManagerRequestHandler.java
- TestOMLifecycleConfigurationCreateRequest.java
- TestOMLifecycleConfigurationDeleteRequest.java
- TestOMLifecycleConfigurationRequest.java
- TestOMLifecycleConfigurationCreateResponse.java
- TestOMLifecycleConfigurationDeleteResponse.java
- RetentionTestUtils.java
- TestLCOpCurrentExpiration.java
- TestLCOpRule.java
- TestLifecycleConfigurationManagerImpl.java

### ozone-s3gateway

- BucketEndpoint.java
- EndpointBase.java
- LifecycleConfiguration.java
- PutBucketLifecycleConfigurationUnmarshaller.java
- S3ErrorTable.java
- S3GatewayConfigKeys.java
- TestLifecycleConfigurationDelete.java
- TestLifecycleConfigurationGet.java
- TestLifecycleConfigurationPut.java