Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11465. Introducing Schema Versioning for Recon to Handle Fresh Installs and Upgrades. #7213

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

ArafatKhan2198
Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 commented Sep 18, 2024

What changes were proposed in this pull request?

This PR introduces an upgrade framework in Recon that mirrors the pattern used in OMLayoutFeature with some key differences tailored for Recon's needs. The primary purpose of this PR is to manage and handle schema upgrades automatically during Recon startup, storing layout version information on disk and executing necessary upgrade actions.

Summary of Changes:

  • Replication of OMLayoutFeature Pattern: Recon now follows a similar schema upgrade pattern as used in the OM layer, but with auto-finalization. Unlike in OM, Recon will automatically finalize on restart without requiring manual intervention.
  • Storage of Metadata Value: The Metadata Layout Version (MLV) is stored in the RECON_SCHEMA_VERSION table on disk. This table tracks the schema version to identify which features have been applied.
  • Computation of SLV: The Software Layout Version (SLV) is computed dynamically based on the latest version defined in ReconLayoutFeature at runtime.
  • Code Flow During Startup :-
    • On Recon startup, the ReconLayoutVersionManager fetches the current MLV from the RECON_SCHEMA_VERSION table.
    • It then computes the SLV by identifying the highest version in ReconLayoutFeature.
    • The manager finalizes any layout features that require upgrading by executing their associated actions and updates the MLV.
  • Newly Added Classes :-
    • ReconLayoutVersionManager: Handles the determination of the current MLV and SLV and performs the finalization of pending layout features.
    • ReconSchemaVersionTableManager: Manages interactions with the RECON_SCHEMA_VERSION table, including fetching and updating the current MLV.
    • ReconLayoutFeature: Enum defining the layout features with their version, description, and direct integration of upgrade actions.
  • Differences from OMLayoutFeature:
    • No manual finalization: Recon auto-finalizes on restart.
    • No downgrade support or behavior blocking: Recon focuses solely on upgrade actions without pre-finalization constraints.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11465

How was this patch tested?

For testing this PR, three example features were introduced into ReconLayoutFeature along with their respective upgrade scripts. These scripts were designed to simply log their execution, allowing us to verify that the upgrade framework works as intended. During Recon startup, the current Metadata Layout Version (MLV) was set lower than the Software Layout Version (SLV). The logs confirmed that each feature's upgrade action executed sequentially, updating the MLV and recording the changes in the RECON_SCHEMA_VERSION table. This successful test demonstrates that the framework correctly finalizes registered features when MLV < SLV.
I'll remove them once the PR is approved by everyone.

Logs showcasing upgrade :-

[main] INFO recon.ReconServer: Creating Recon Schema.
[main] INFO upgrade.ReconLayoutVersionManager: Current MLV: 0. SLV: 3. Checking features for registration...
[main] INFO upgrade.ReconLayoutVersionManager: Feature FEATURE_1 (version 1) is registered for finalization.
[main] INFO upgrade.ReconLayoutVersionManager: Feature FEATURE_2 (version 2) is registered for finalization.
[main] INFO upgrade.ReconLayoutVersionManager: Feature FEATURE_3 (version 3) is registered for finalization.
[main] INFO upgrade.Feature1UpgradeAction: Executing Feature 1 upgrade:
[main] INFO recon.ReconSchemaVersionTableManager: Inserted new schema version '1'.
[main] INFO upgrade.ReconLayoutVersionManager: MLV updated to: 1
[main] INFO upgrade.ReconLayoutVersionManager: Feature 1 finalized successfully.
[main] INFO upgrade.Feature2UpgradeAction: Executing Feature 2 upgrade
[main] INFO recon.ReconSchemaVersionTableManager: Updated schema version to '2'.
[main] INFO upgrade.ReconLayoutVersionManager: MLV updated to: 2
[main] INFO upgrade.ReconLayoutVersionManager: Feature 2 finalized successfully.
[main] INFO upgrade.Feature3UpgradeAction: Executing Feature 3 upgrade
[main] INFO recon.ReconSchemaVersionTableManager: Updated schema version to '3'.
[main] INFO upgrade.ReconLayoutVersionManager: MLV updated to: 3
[main] INFO upgrade.ReconLayoutVersionManager: Feature 3 finalized successfully.

@errose28
Copy link
Contributor

Thanks for working on this @ArafatKhan2198. Can we use the existing concept of "layout features" for Recon the same as they are used on other components instead of introducing a new concept unique to Recon? Recon will need to support downgrades the same as other components, and once we start versioning things it will be easier to let the existing upgrade framework handle that.

@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review September 18, 2024 19:32
Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for working on this patch. Few comments. Pls check.

}

// Upgrade schema if necessary
if (!currentVersion.equals(latestVersion)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as of now in this version, we are handling 2 cases:

  1. Fresh install (version will be set as 2.0)
  2. Old version with no schema version table (current version set as 1.0)
    and in new version v2.0, we are only updating the schema of UNHEALTHY_CONTAINERS table schema and not the schema of other derby tables , so isn't good if we also maintain the list of tables and details in schemaversion table whose schema was updated in that respective version.

public void upgradeSchema(String fromVersion, String toVersion)
throws SQLException {
Connection conn = dataSource.getConnection();
if (!TABLE_EXISTS_CHECK.test(conn, UNHEALTHY_CONTAINERS_TABLE_NAME)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what use case, this condition can happen ? As I can see that we are calling upgradeSchema only after initialization of all schemas ?

@Override
public void upgradeSchema(String fromVersion, String toVersion)
throws SQLException {
// No schema upgrades needed for the stats table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put a log that no schema version upgraded for this XYZ table in this version ?

if (!currentVersion.equals(latestVersion)) {
upgradeAllSchemas(currentVersion, latestVersion);
// Update the schema version in the version table after migration
schemaVersionTableManager.updateSchemaVersion(latestVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect here, we are updating the schema version in schemaVersionTable even if we get some error while upgrading the schema of a required table ? How are we handling failure cases.

}

/**
* Update the schema version in the RECON_SCHEMA_VERSION_TABLE after all tables are upgraded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't justify the error scenarios if any schema upgrade fails.

@@ -115,7 +115,7 @@ protected void configure() {
public void createSchema(Injector inj) {
ReconSchemaManager reconSchemaManager =
inj.getInstance(ReconSchemaManager.class);
reconSchemaManager.createReconSchema();
reconSchemaManager.createAndUpgradeReconSchema();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some more test cases related to error and failure and assert when and all schema version gets inserted/ updated in schema version table ?

@ArafatKhan2198
Copy link
Contributor Author

@errose28 @sumitagrawl
Could you please go through the description again for this patch.
I have made changes as per our discussion, where we try to bring versioning to entire of recon rather than just the derby tables.

@devmadhuu
Copy link
Contributor

@ArafatKhan2198 , In Recon context, what all MLV will incorporate only Derby tables schema update ? or will it take care recon rocks DB schema changes as well like container DB ?
And SLV is only feature change without any schema changes ? Whether addition of feature will automatically make MLV also eligible for increment in version along with SLV version ?
Can you pls explain ?

@ArafatKhan2198
Copy link
Contributor Author

ArafatKhan2198 commented Oct 1, 2024

@ArafatKhan2198 , In Recon context, what all MLV will incorporate only Derby tables schema update ? or will it take care recon rocks DB schema changes as well like container DB ? And SLV is only feature change without any schema changes ? Whether addition of feature will automatically make MLV also eligible for increment in version along with SLV version ? Can you pls explain ?

Hi @devmadhuu,

In the context of Recon, the MLV (Metadata Layout Version) is not limited to just the Derby DB tables. It encompasses all Recon-specific components, including both Derby tables and RocksDB tables like the container DB. Essentially, MLV manages and tracks any schema changes or modifications in the Recon data structures, ensuring they are all updated consistently.

The SLV (Software Layout Version), on the other hand, represents the software-defined feature set. It covers all new features that may or may not involve schema changes. When a new feature is added that modifies or affects the Recon DB schemas, whether it's RocksDB or Derby DB, the SLV version will get bumped, prompting an MLV increment. This is managed with the help of upgrade scripts during the finalization process.

In summary:

  • MLV includes all schema changes across Recon, including both Derby and RocksDB tables.
  • SLV pertains to the overall set of Recon features, and adding a feature that involves schema changes will cause the SLV version to be bumped, automatically resulting in an MLV increment through the upgrade scripts.

Let me know if this clarifies your concerns!

@devmadhuu
Copy link
Contributor

Pls fix checkstyle issues.

@devmadhuu
Copy link
Contributor

@ArafatKhan2198 , In Recon context, what all MLV will incorporate only Derby tables schema update ? or will it take care recon rocks DB schema changes as well like container DB ? And SLV is only feature change without any schema changes ? Whether addition of feature will automatically make MLV also eligible for increment in version along with SLV version ? Can you pls explain ?

Hi @devmadhuu,

In the context of Recon, the MLV (Metadata Layout Version) is not limited to just the Derby DB tables. It encompasses all Recon-specific components, including both Derby tables and RocksDB tables like the container DB. Essentially, MLV manages and tracks any schema changes or modifications in the Recon data structures, ensuring they are all updated consistently.

The SLV (Software Layout Version), on the other hand, represents the software-defined feature set. It covers all new features that may or may not involve schema changes. When a new feature is added that modifies or affects the Recon DB schemas, whether it's RocksDB or Derby DB, the SLV version will get bumped, prompting an MLV increment. This is managed with the help of upgrade scripts during the finalization process.

In summary:

* MLV includes all schema changes across Recon, including both Derby and RocksDB tables.

* SLV pertains to the overall set of Recon features, and adding a feature that involves schema changes will cause the SLV version to be bumped, automatically resulting in an MLV increment through the upgrade scripts.

Let me know if this clarifies your concerns!

Pls correct the terms here, looks like you are using both terms for schema layout versioning..

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for explaining the patch and motive behind this patch. Overall looks good, just few comments and also pls update the comments in all code to. reflect the SLV meaning as Software Layout Version rather as Schema Layout Version.

As discussed, later once code freezed, you will also remove the dummy created features. Thanks.

List<ReconLayoutFeature> registeredFeatures = allFeatures.stream()
.filter(feature -> {
boolean shouldRegister =
feature.getVersion() > currentMLV && feature.getVersion() <= currentSLV;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature.getVersion() <= currentSLV, here comparing the feature version with less than current SLV, pls check if it is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ArafatKhan2198 ArafatKhan2198 changed the title HDDS-11465. Introducing Schema Versioning for Recon Derby table to Handle Fresh Installs and Upgrades. HDDS-11465. Introducing Schema Versioning for Recon to Handle Fresh Installs and Upgrades. Oct 5, 2024
Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls correct all comments in whole PR and replace the meaning of SLV (Schema layout version) to (Software Layout version)

@Singleton
public class SchemaVersionTableDefinition implements ReconSchemaDefinition {

public static final String SCHEMA_VERSION_TABLE_NAME = "RECON_SCHEMA_VERSION";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define this in constants class outside as being used in this class as well as ReconSchemaVersionTableManager

Copy link
Contributor Author

@ArafatKhan2198 ArafatKhan2198 Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu
Most of the schema definition classes for other tables include a field within the class that specifies the table name. Examples of such classes are ContainerSchemaDefinition, ReconTaskSchemaDefinition, StatsSchemaDefinition, and UtilizationSchemaDefinition.

Therefore, I decided to include the table name within the SchemaDefinition class itself for consistency.

Should I change for all of them?

* and associated upgrade action to be executed during an upgrade.
*/
public enum ReconLayoutFeature {
INITIAL_VERSION(0, "Initial Layout Version", new InitialLayoutUpgradeAction()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need binding the Action for every version, when required, can be dinamically added on scan, like done for OM - org.apache.hadoop.ozone.om.upgrade.QuotaRepairUpgradeAction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment @sumitagrawl

In ReconLayoutFeature, upgrade actions are embedded directly within the enum values, making the process simple and straightforward. Each feature (e.g., FEATURE_1) has its corresponding action (e.g., Feature1UpgradeAction). During an upgrade, the system compares the current MLV (Metadata Layout Version) and SLV (Software Layout Version). If the MLV is outdated, the associated action is executed, and the MLV is updated. Since Recon does not have many upgrade phases, I decided to omit additional phases, keeping the upgrade process simple and easy to manage.

On the other hand, OMLayoutFeature introduces a more complex process. Actions are not embedded but dynamically registered and associated with different phases, like FIRST_RUN_ON_UPGRADE and ON_FINALIZE. The system scans for upgrade actions and links them to features based on annotations as different actions can be triggered at different stages of the upgrade, making it suitable for more complex upgrade scenarios.

In summary, Recon uses a simpler, more direct approach by omitting multiple upgrade phases, while OM provides flexibility through phased upgrades, allowing it to handle more intricate upgrade processes.

Should I shift the approach then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a plan to support prepare and finalize -- may not be part of this PR.
if no action, either you can define it as functional interface with empty lamda implementation for Version "0".

private static final ReconUpgradeAction NO_FINALIZE_ACTION = () -> {};
INITIAL_VERSION(0, "Initial Layout Version", NO_FINALIZE_ACTION),

This can be refactored later on based on required support.

* and associated upgrade action to be executed during an upgrade.
*/
public enum ReconLayoutFeature {
INITIAL_VERSION(0, "Initial Layout Version", new InitialLayoutUpgradeAction()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a plan to support prepare and finalize -- may not be part of this PR.
if no action, either you can define it as functional interface with empty lamda implementation for Version "0".

private static final ReconUpgradeAction NO_FINALIZE_ACTION = () -> {};
INITIAL_VERSION(0, "Initial Layout Version", NO_FINALIZE_ACTION),

This can be refactored later on based on required support.

* and associated upgrade action to be executed during an upgrade.
*/
public enum ReconLayoutFeature {
INITIAL_VERSION(0, "Initial Layout Version", new InitialLayoutUpgradeAction()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we handling avoid to re-create constraint with this change? I think that is a part of first version "0" as per this task. That change is missing.

@ArafatKhan2198
Copy link
Contributor Author

Testing out the Dynamic Upgrade action handling for Recon Schema Versioning


For testing purposes, I introduced another upgrade action type called TEST_ACTION alongside AUTO_FINALIZE for the feature named FEATURE_3. For this feature, I assigned two upgrade actions—one of type AUTO_FINALIZE and the other of type TEST_ACTION. These are only used for testing and will be removed later.

When I started up Recon, I observed the following log outputs:

[main] INFO upgrade.ReconLayoutVersionManager: Current MLV: 0. SLV: 3. Checking features for registration...
[main] INFO upgrade.ReconLayoutVersionManager: Feature FEATURE_1 (version 1) is registered for finalization.
[main] INFO upgrade.ReconLayoutVersionManager: Feature FEATURE_2 (version 2) is registered for finalization.
[main] INFO upgrade.ReconLayoutVersionManager: Feature FEATURE_3 (version 3) is registered for finalization.
[main] INFO upgrade.Feature1UpgradeActionForAutoFinalise: Executing Feature 1 upgrade script for action type AUTO_FINALIZE...
[main] INFO recon.ReconSchemaVersionTableManager: Inserted new schema version '1'.
[main] INFO upgrade.ReconLayoutVersionManager: MLV updated to: 1
[main] INFO upgrade.ReconLayoutVersionManager: Feature 1 finalized successfully.
[main] INFO upgrade.Feature2UpgradeActionForAutoFinalise: Executing Feature 2 upgrade script for action type AUTO_FINALIZE...
[main] INFO recon.ReconSchemaVersionTableManager: Updated schema version to '2'.
[main] INFO upgrade.ReconLayoutVersionManager: MLV updated to: 2
[main] INFO upgrade.ReconLayoutVersionManager: Feature 2 finalized successfully.
[main] INFO upgrade.Feature3UpgradeActionForAutoFinalise: Executing Feature 3 upgrade script for action type AUTO_FINALIZE...
[main] INFO recon.ReconSchemaVersionTableManager: Updated schema version to '3'.
[main] INFO upgrade.ReconLayoutVersionManager: MLV updated to: 3
[main] INFO upgrade.ReconLayoutVersionManager: Feature 3 finalized successfully.

Currently, for Recon, we've decided that upgrade actions will be autofinalized. This means that only actions marked with the type AUTO_FINALIZE are selected and executed. In the case of FEATURE_3, the upgrade action of type TEST_ACTION was not executed at all. This is because the code is specifically designed to execute only AUTO_FINALIZE upgrade actions, and thus, there is no log output like, "Executing Feature 3 upgrade script for action type TEST_ACTION."

This confirms that annotation-based upgrade action execution is working as expected, as only the appropriate actions (in this case, AUTO_FINALIZE) are executed while others are ignored.

@sumitagrawl @devmadhuu please go through this

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198 Plz check few minor comments

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198 , thanks for improving the patch. Few comments. Pls check.

public ReconLayoutVersionManager(ReconSchemaVersionTableManager schemaVersionTableManager) {
this.schemaVersionTableManager = schemaVersionTableManager;
this.currentMLV = determineMLV();
this.currentSLV = determineSLV();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need SLV as such here as member variable as this is not being used anywhere. SLV or feature version is only used to determine the max or higher feature versions greater than current MLV and actions of those only should be executed , So this logic can dynamically be executed without defining a member variable. As per current code SLV and MLV are both same and SLV as such defined as member variable not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the occurrence of SLV field

LOG.info("Feature {} finalized successfully.", feature.getVersion());
}
} catch (Exception e) {
LOG.error("Failed to finalize feature {}: {}", feature.getVersion(), e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since after introduction of this framework, features having schema update will be auto-finalized, and in case of any unexpected error or exception, this will silently be eaten up or logged without knowing of error if feature was actually upgraded or not and we as developers will think that feature will be released and upgraded, however this may be logged as silent failure, then its going to be difficult to know with lot of unpleasant surprises. Atleast we need to update this failure with proper details to ReconContext as well as should fail the Recon start, till we have alert mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will now be tracking failed updates in recon context

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will now be tracking failed updates in recon context

I don't think this behavior is correct. Here we should fail to start Recon along with updating in ReconContext, else we'll never know what happened until user comes back and report us of any issues and our expectation is different.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give few more comments

.orElse(-1); // Return -1 if no version is found
} catch (Exception e) {
LOG.error("Failed to fetch the current schema version.", e);
return 0; // Return 0 if there is an exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may need throw exception if unable to read table. "0" may give wrong data.

LOG.info("Inserted new schema version '{}'.", newVersion);
}
} catch (Exception e) {
LOG.error("Failed to update schema version to '{}'.", newVersion, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need throw exception if unable to update.

try {
// If the feature is INITIAL_VERSION, skip executing any action and just update the schema version
if (feature == ReconLayoutFeature.INITIAL_VERSION) {
updateSchemaVersion(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should call action if any are there registered. If not registered, it will not come in loop. Do not need any special handling.

*/
private void updateSchemaVersion(int newVersion) {
// Logic to update the MLV in the database
this.currentMLV = newVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting currentMLV can be after db update. if DB have some exception, this flag is not required to be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants