From fa831f24f71010a8c8d6085c3ffe0963b292f7ac Mon Sep 17 00:00:00 2001 From: Mohanad Elsafty Date: Thu, 25 Apr 2024 20:07:25 +0800 Subject: [PATCH 1/6] add lifecycle-configurations.md --- .../design/lifecycle-configurations.md | 251 ++++++++++++++++++ 1 file changed, 251 insertions(+) create mode 100644 hadoop-hdds/docs/content/design/lifecycle-configurations.md diff --git a/hadoop-hdds/docs/content/design/lifecycle-configurations.md b/hadoop-hdds/docs/content/design/lifecycle-configurations.md new file mode 100644 index 00000000000..af4b89b08bc --- /dev/null +++ b/hadoop-hdds/docs/content/design/lifecycle-configurations.md @@ -0,0 +1,251 @@ +--- +title: AWS S3 Lifecycle Configurations +summary: Enables users to manage lifecycle configurations for buckets, allowing automated deletion of keys based on predefined rules. +date: 2024-04-25 +jira: HDDS-8342 +status: draft +--- + + +# Lifecycle Management + +## Introduction +I encountered the need for a retention solution within my cluster, specifically the ability to delete keys in specific paths after a certain time period. +This requirement closely resembled the functionality provided by AWS S3 Lifecycle configurations, particularly the Expiration part ([AWS S3 Lifecycle Configuration Examples](https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html)). +After designing and implementing a solution, I am eager to contribute it back to the Apache Ozone community. + +## Overview + +### Functionality +- User should be able to create/remove/fetch lifecycle configurations for a specific S3 bucket. +- The lifecycle configurations will be executed periodically. +- Depending on the rules of the lifecycle configuration there could be different actions or even multiple actions. +- At the moment only expiration is supported (keys get deleted). +- The lifecycle configurations supports all buckets not only S3 buckets. + + +### Components + +- Lifecycle configurations (will be stored in DB) consists of volumeName, bucketName and a list of rules + - A rule contains prefix (string), Expiration and an optional Filter. + - Expiration contains either days (integer) or Date (long) + - Filter contains prefix (string). +- S3G bucket endpoint needs few updates to accept ?/lifecycle +- ClientProtocol and all implementers provides (get, list, delete and create) lifecycle configuration +- RetentionManager will be running periodically. + - Fetches a lifecycle configurations list with the help of OM + - Executes each lifecycle configuration on a specific bucket + - Lifecycle configurations will be running on parallel (each one against different bucket). + + +### Flow +1. Users interact with lifecycle configurations via S3Gateway. +2. Configuration details are processed by a handler. +3. Configurations are saved/fetched from the database. +4. RetentionManager, running periodically in the Leader OM, executes lifecycle configurations and issues deletions for eligible keys. + +## Limitations +- The current solution lacks certain features: + - Filter doesn't support `AND` operations. + - Only expiration actions are supported. + - Lack of CLI support for managing lifecycle configurations across all buckets (S3G is the only supported entry point). + - +All these kind of features can be added in the future. + +## Protobuf Definitions +```protobuf +/** +S3 lifecycles (filter, expiration, rule and configuration). + */ +message LifecycleFilter { + optional string prefix = 1; +} + +message LifecycleExpiration { + optional uint32 days = 1; + optional string date = 2; +} + +message LifecycleRule { + optional string id = 1; + optional string prefix = 2; + required bool enabled = 3; + optional LifecycleExpiration expiration = 4; + optional LifecycleFilter filter = 5; +} + +message LifecycleConfiguration { + required string volume = 1; + required string bucket = 2; + required string owner = 3; + optional uint64 creationTime = 4; + repeated LifecycleRule rules = 5; + optional uint64 objectID = 6; + optional uint64 updateID = 7; +} + +message CreateLifecycleConfigurationRequest { + required LifecycleConfiguration lifecycleConfiguration = 1; +} + +message CreateLifecycleConfigurationResponse { + +} + +message InfoLifecycleConfigurationRequest { + required string volumeName = 1; + required string bucketName = 2; +} + +message InfoLifecycleConfigurationResponse { + required LifecycleConfiguration lifecycleConfiguration = 1; +} + +message DeleteLifecycleConfigurationRequest { + required string volumeName = 1; + required string bucketName = 2; +} + +message DeleteLifecycleConfigurationResponse { + +} + +message ListLifecycleConfigurationsRequest { + optional string userName = 1; + optional string prevKey = 2; + optional uint32 maxKeys = 3; +} + +message ListLifecycleConfigurationsResponse { + repeated LifecycleConfiguration lifecycleConfiguration = 1; +} +``` + +# Proposal + +## 1. New Table for Lifecycle Configurations + +- Introduce a new table +- Efficient query. +- Requires a new manager (lifecycle manager) and codec. +- No need to alter existing design. +- Update Bucket Deletion to delete linked lifecycle configurations when the bucket is deleted. + +## 2. New Field in OmBucketInfo + +- Utilize an existing table +- Less efficient query. +- No need for a new manager or codec. +- Update existing design to support lifecycle configurations in OmBucketInfo. +- Updates required for create, get, list, and delete operations in the BucketManager. + +## Design Decisions +I made some decisions regarding the design, which require discussion before contribution: +- Lifecycle configurations are stored in their own table in the database, rather than as a field in OmBucketInfo. + - Reasons for this decision: + - Avoid modifying OmBucketInfo table. + - Improve query efficiency for RetentionManager. +- If the alternative approach (storing lifecycle configurations in OmBucketInfo) is preferred, I will eliminate LifecycleConfigurationsManager & the new codec. + +## Plan for Contribution +The implementation is substantial and should be split into several merge requests for better review: +1. Basic building blocks (lifecycle configuration, rule, expiration, etc.) and related table creation. +2. ClientProtocol & OzoneManager new operations for managing lifecycle configurations (including protobuf messages). +3. Updates to S3G endpoints. +4. Implementation of the RetentionManager. +5. Merge all changes into a new branch (e.g., 'X'), then merge that branch into master. + + +# Files Affected + +## Implemented Proposal: New Table for Lifecycle Configurations + +### hdds-common + +- OzoneConfigKeys.java +- OzoneConsts.java + +### ozone-client + +- ClientProtocol.java +- RpcClient.java +- OzoneLifecycleConfiguration.java + +### ozone-common + +- OmLCExpiration.java +- OmLCFilter.java +- OmLCRule.java +- OmLifecycleConfiguration.java +- OzoneManagerProtocol.java +- OzoneManagerProtocolClientSideTranslatorPB.java +- OMConfigKeys.java +- OmUtils.java +- TestOmLifeCycleConfiguration.java + +### ozone-integration-test + +- TestOzoneRpcClientAbstract.java +- TestSecureOzoneRpcClient.java +- TestRetentionManager.java +- TestDataUtil.java + +### ozone-interface-client + +- OmClientProtocol.proto + +### ozone-interface-storage + +- OmLifecycleConfigurationCodec.java +- OMMetadataManager.java +- OMDBDefinition.java +- OzoneManagerRatisUtils.java +- OMBucketDeleteRequest.java +- OMLifecycleConfigurationCreateRequest.java +- OMLifecycleConfigurationDeleteRequest.java +- OMBucketDeleteResponse.java +- OMLifecycleConfigurationCreateResponse.java +- OMLifecycleConfigurationDeleteResponse.java +- LCOpAction.java +- LCOpCurrentExpiration.java +- LCOpRule.java +- RetentionManager.java +- RetentionManagerImpl.java +- LifecycleConfigurationManager.java +- LifecycleConfigurationManagerImpl.java +- OmMetadataManagerImpl.java +- OzoneManager.java +- OzoneManagerRequestHandler.java +- TestOMLifecycleConfigurationCreateRequest.java +- TestOMLifecycleConfigurationDeleteRequest.java +- TestOMLifecycleConfigurationRequest.java +- TestOMLifecycleConfigurationCreateResponse.java +- TestOMLifecycleConfigurationDeleteResponse.java +- RetentionTestUtils.java +- TestLCOpCurrentExpiration.java +- TestLCOpRule.java +- TestLifecycleConfigurationManagerImpl.java + +### ozone-s3gateway + +- BucketEndpoint.java +- EndpointBase.java +- LifecycleConfiguration.java +- PutBucketLifecycleConfigurationUnmarshaller.java +- S3ErrorTable.java +- S3GatewayConfigKeys.java +- TestLifecycleConfigurationDelete.java +- TestLifecycleConfigurationGet.java +- TestLifecycleConfigurationPut.java + + From 7cba49b213e449eaa351c2e4fb9908926468b87c Mon Sep 17 00:00:00 2001 From: Mohanad Elsafty Date: Sat, 27 Apr 2024 07:59:30 +0800 Subject: [PATCH 2/6] Update lifecycle-configurations.md --- hadoop-hdds/docs/content/design/lifecycle-configurations.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/hadoop-hdds/docs/content/design/lifecycle-configurations.md b/hadoop-hdds/docs/content/design/lifecycle-configurations.md index af4b89b08bc..640782fb5f0 100644 --- a/hadoop-hdds/docs/content/design/lifecycle-configurations.md +++ b/hadoop-hdds/docs/content/design/lifecycle-configurations.md @@ -22,7 +22,6 @@ status: draft ## Introduction I encountered the need for a retention solution within my cluster, specifically the ability to delete keys in specific paths after a certain time period. This requirement closely resembled the functionality provided by AWS S3 Lifecycle configurations, particularly the Expiration part ([AWS S3 Lifecycle Configuration Examples](https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html)). -After designing and implementing a solution, I am eager to contribute it back to the Apache Ozone community. ## Overview @@ -38,6 +37,7 @@ After designing and implementing a solution, I am eager to contribute it back to - Lifecycle configurations (will be stored in DB) consists of volumeName, bucketName and a list of rules - A rule contains prefix (string), Expiration and an optional Filter. + - Object tagging integrations for bucket lifecycle configuration. - Expiration contains either days (integer) or Date (long) - Filter contains prefix (string). - S3G bucket endpoint needs few updates to accept ?/lifecycle @@ -56,10 +56,9 @@ After designing and implementing a solution, I am eager to contribute it back to ## Limitations - The current solution lacks certain features: - - Filter doesn't support `AND` operations. - Only expiration actions are supported. - Lack of CLI support for managing lifecycle configurations across all buckets (S3G is the only supported entry point). - - + All these kind of features can be added in the future. ## Protobuf Definitions From c2c9fa54f87b1ad87931e64e85c4cd1443a2bd30 Mon Sep 17 00:00:00 2001 From: Mohanad Elsafty Date: Sun, 28 Apr 2024 21:20:31 +0800 Subject: [PATCH 3/6] Update lifecycle-configurations.md --- .../docs/content/design/lifecycle-configurations.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/hadoop-hdds/docs/content/design/lifecycle-configurations.md b/hadoop-hdds/docs/content/design/lifecycle-configurations.md index 640782fb5f0..1543512a2e6 100644 --- a/hadoop-hdds/docs/content/design/lifecycle-configurations.md +++ b/hadoop-hdds/docs/content/design/lifecycle-configurations.md @@ -42,10 +42,13 @@ This requirement closely resembled the functionality provided by AWS S3 Lifecycl - Filter contains prefix (string). - S3G bucket endpoint needs few updates to accept ?/lifecycle - ClientProtocol and all implementers provides (get, list, delete and create) lifecycle configuration -- RetentionManager will be running periodically. - - Fetches a lifecycle configurations list with the help of OM - - Executes each lifecycle configuration on a specific bucket - - Lifecycle configurations will be running on parallel (each one against different bucket). +- RetentionManager: + - Upon startup, the OzoneManager initializes the Retention Manager based on configuration parameters such as retention interval. + - A background retention service is responsible for scheduling and executing tasks at specified intervals. + - The Retention Manager retrieves lifecycle configurations associated with buckets. + - Then assigns each lifecycle configuration (attached to a bucket) to a threadpool (Configurable) for further processing. + - Each task will iterate through keys of a specific bucket and issue deletion request for eligible keys. + ### Flow From 30c64674a8c660d841d632684d05092d85c600d3 Mon Sep 17 00:00:00 2001 From: Mohanad Elsafty Date: Sun, 26 May 2024 07:29:48 +0800 Subject: [PATCH 4/6] Add table format details --- .../design/lifecycle-configurations.md | 28 +++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/hadoop-hdds/docs/content/design/lifecycle-configurations.md b/hadoop-hdds/docs/content/design/lifecycle-configurations.md index 1543512a2e6..fccefbd4a8a 100644 --- a/hadoop-hdds/docs/content/design/lifecycle-configurations.md +++ b/hadoop-hdds/docs/content/design/lifecycle-configurations.md @@ -133,6 +133,34 @@ message ListLifecycleConfigurationsResponse { } ``` +# Table format +## OmLifecycleConfiguration Table + +The `OmLifecycleConfiguration` table in RocksDB is used to store lifecycle configurations of buckets. Below is a summary of the table structure. + +### Table Structure + +| Column Name | Data Type | Description | +|--------------|-----------------------|----------------------------------------------------------| +| volume | String | The name of the volume. | +| bucket | String | The name of the bucket. | +| owner | String | The owner of the volume/bucket. | +| creationTime | long | The creation time of the configuration. | +| rules | List | A list of lifecycle rules associated with the configuration. | +| objectID | long | Unique identifier for the object. | +| updateID | long | Identifier for updates to the object. | + + +### Additional Information + +- **Maximum Rules**: The table can store up to 1000 rules per lifecycle configuration. +- **Validation**: The configuration is considered valid if: + - The `volume`, `bucket`, and `owner` are not blank. + - The number of rules is between 1 and 1000. + - Each rule has a unique ID. + - All rules are valid according to their individual validation criteria. + + # Proposal ## 1. New Table for Lifecycle Configurations From 816dbf0aa03b04e90731d0bd06f0beb338d2ac9b Mon Sep 17 00:00:00 2001 From: Mohanad Elsafty Date: Sun, 26 May 2024 07:53:21 +0800 Subject: [PATCH 5/6] add retention manager design --- .../design/lifecycle-configurations.md | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/hadoop-hdds/docs/content/design/lifecycle-configurations.md b/hadoop-hdds/docs/content/design/lifecycle-configurations.md index fccefbd4a8a..c6738e98416 100644 --- a/hadoop-hdds/docs/content/design/lifecycle-configurations.md +++ b/hadoop-hdds/docs/content/design/lifecycle-configurations.md @@ -160,6 +160,26 @@ The `OmLifecycleConfiguration` table in RocksDB is used to store lifecycle confi - Each rule has a unique ID. - All rules are valid according to their individual validation criteria. +# Retention Manager +## High-Level Flow + +1. **Initialization and Start:** + - The retention manager is initialized with required parameters (rate limit, max iterators, and running interval). + - A retention service is started in the OzoneManager, running periodically based on a configured interval. + +2. **Periodic Execution:** + - Each time the service runs, it checks if the current node is the leader and sleeps if it is not the leader. + - If it is the leader, it proceeds with the following operations: + * Retrieve the lifecycle configurations list. + * Each lifecycle configuration represents a bucket and contains a list of lifecycle rules to be applied. + * Lifecycle configurations are handled simultaneously by a configurable threadpool executor. + * The operation involves scanning the bucket's entries, and if they are eligible, performing an action (currently, deletion). + +## Concurrency and Rate Limiting + +1. **Thread Pool:** the thread pool is configurable to allow concurrent processing of lifecycle configurations. +2. **Rate Limiter:** the RateLimiter controls the rate of key deletions, ensuring system stability. + # Proposal From 97ec5f8355cd573d41322fa57127e10b96aa8ffc Mon Sep 17 00:00:00 2001 From: Mohanad Elsafty Date: Tue, 18 Jun 2024 07:15:56 +0800 Subject: [PATCH 6/6] Add AWS refrence urls --- hadoop-hdds/docs/content/design/lifecycle-configurations.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hadoop-hdds/docs/content/design/lifecycle-configurations.md b/hadoop-hdds/docs/content/design/lifecycle-configurations.md index c6738e98416..c3d5383f216 100644 --- a/hadoop-hdds/docs/content/design/lifecycle-configurations.md +++ b/hadoop-hdds/docs/content/design/lifecycle-configurations.md @@ -153,12 +153,13 @@ The `OmLifecycleConfiguration` table in RocksDB is used to store lifecycle confi ### Additional Information -- **Maximum Rules**: The table can store up to 1000 rules per lifecycle configuration. +- **Maximum Rules**: The table can store up to 1000 rules per lifecycle configuration according to https://docs.aws.amazon.com/AmazonS3/latest/userguide/intro-lifecycle-rules.html#intro-lifecycle-rule-id - **Validation**: The configuration is considered valid if: - The `volume`, `bucket`, and `owner` are not blank. - The number of rules is between 1 and 1000. - Each rule has a unique ID. - All rules are valid according to their individual validation criteria. +- **Conflict Resolution**: In case of overlapping lifecycle configurations the implementation follows AWS cost optimizing strategy to solve the conflicts https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html#lifecycle-config-conceptual-ex5 # Retention Manager ## High-Level Flow