-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11465. Introducing Schema Versioning for Recon to Handle Fresh Installs and Upgrades. #7213
base: master
Are you sure you want to change the base?
Conversation
…ndle Fresh Installs and Upgrades.
Thanks for working on this @ArafatKhan2198. Can we use the existing concept of "layout features" for Recon the same as they are used on other components instead of introducing a new concept unique to Recon? Recon will need to support downgrades the same as other components, and once we start versioning things it will be easier to let the existing upgrade framework handle that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ArafatKhan2198 for working on this patch. Few comments. Pls check.
...one/recon-codegen/src/main/java/org/hadoop/ozone/recon/schema/ContainerSchemaDefinition.java
Outdated
Show resolved
Hide resolved
...one/recon-codegen/src/main/java/org/hadoop/ozone/recon/schema/ContainerSchemaDefinition.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
// Upgrade schema if necessary | ||
if (!currentVersion.equals(latestVersion)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as of now in this version, we are handling 2 cases:
- Fresh install (version will be set as
2.0
) - Old version with no schema version table (current version set as
1.0
)
and in new versionv2.0
, we are only updating the schema ofUNHEALTHY_CONTAINERS
table schema and not the schema of other derby tables , so isn't good if we also maintain the list of tables and details inschemaversion
table whose schema was updated in that respective version.
public void upgradeSchema(String fromVersion, String toVersion) | ||
throws SQLException { | ||
Connection conn = dataSource.getConnection(); | ||
if (!TABLE_EXISTS_CHECK.test(conn, UNHEALTHY_CONTAINERS_TABLE_NAME)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what use case, this condition can happen ? As I can see that we are calling upgradeSchema
only after initialization of all schemas ?
@Override | ||
public void upgradeSchema(String fromVersion, String toVersion) | ||
throws SQLException { | ||
// No schema upgrades needed for the stats table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put a log that no schema version upgraded for this XYZ table in this version ?
if (!currentVersion.equals(latestVersion)) { | ||
upgradeAllSchemas(currentVersion, latestVersion); | ||
// Update the schema version in the version table after migration | ||
schemaVersionTableManager.updateSchemaVersion(latestVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems incorrect here, we are updating the schema version in schemaVersionTable even if we get some error while upgrading the schema of a required table ? How are we handling failure cases.
} | ||
|
||
/** | ||
* Update the schema version in the RECON_SCHEMA_VERSION_TABLE after all tables are upgraded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't justify the error scenarios if any schema upgrade fails.
@@ -115,7 +115,7 @@ protected void configure() { | |||
public void createSchema(Injector inj) { | |||
ReconSchemaManager reconSchemaManager = | |||
inj.getInstance(ReconSchemaManager.class); | |||
reconSchemaManager.createReconSchema(); | |||
reconSchemaManager.createAndUpgradeReconSchema(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some more test cases related to error and failure and assert when and all schema version gets inserted/ updated in schema version table ?
@errose28 @sumitagrawl |
@ArafatKhan2198 , In Recon context, what all MLV will incorporate only Derby tables schema update ? or will it take care recon rocks DB schema changes as well like container DB ? |
Hi @devmadhuu, In the context of Recon, the MLV (Metadata Layout Version) is not limited to just the Derby DB tables. It encompasses all Recon-specific components, including both Derby tables and RocksDB tables like the container DB. Essentially, MLV manages and tracks any schema changes or modifications in the Recon data structures, ensuring they are all updated consistently. The SLV (Software Layout Version), on the other hand, represents the software-defined feature set. It covers all new features that may or may not involve schema changes. When a new feature is added that modifies or affects the Recon DB schemas, whether it's RocksDB or Derby DB, the SLV version will get bumped, prompting an MLV increment. This is managed with the help of upgrade scripts during the finalization process. In summary:
Let me know if this clarifies your concerns! |
Pls fix checkstyle issues. |
Pls correct the terms here, looks like you are using both terms for schema layout versioning.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ArafatKhan2198 for explaining the patch and motive behind this patch. Overall looks good, just few comments and also pls update the comments in all code to. reflect the SLV meaning as Software Layout Version rather as Schema Layout Version.
As discussed, later once code freezed, you will also remove the dummy created features. Thanks.
List<ReconLayoutFeature> registeredFeatures = allFeatures.stream() | ||
.filter(feature -> { | ||
boolean shouldRegister = | ||
feature.getVersion() > currentMLV && feature.getVersion() <= currentSLV; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feature.getVersion() <= currentSLV
, here comparing the feature version with less than current SLV, pls check if it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls correct all comments in whole PR and replace the meaning of SLV (Schema layout version) to (Software Layout version)
@Singleton | ||
public class SchemaVersionTableDefinition implements ReconSchemaDefinition { | ||
|
||
public static final String SCHEMA_VERSION_TABLE_NAME = "RECON_SCHEMA_VERSION"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define this in constants class outside as being used in this class as well as ReconSchemaVersionTableManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devmadhuu
Most of the schema definition classes for other tables include a field within the class that specifies the table name. Examples of such classes are ContainerSchemaDefinition, ReconTaskSchemaDefinition, StatsSchemaDefinition, and UtilizationSchemaDefinition.
Therefore, I decided to include the table name within the SchemaDefinition class itself for consistency.
Should I change for all of them?
* and associated upgrade action to be executed during an upgrade. | ||
*/ | ||
public enum ReconLayoutFeature { | ||
INITIAL_VERSION(0, "Initial Layout Version", new InitialLayoutUpgradeAction()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need binding the Action for every version, when required, can be dinamically added on scan, like done for OM - org.apache.hadoop.ozone.om.upgrade.QuotaRepairUpgradeAction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment @sumitagrawl
In ReconLayoutFeature, upgrade actions are embedded directly within the enum values, making the process simple and straightforward. Each feature (e.g., FEATURE_1) has its corresponding action (e.g., Feature1UpgradeAction). During an upgrade, the system compares the current MLV (Metadata Layout Version) and SLV (Software Layout Version). If the MLV is outdated, the associated action is executed, and the MLV is updated. Since Recon does not have many upgrade phases, I decided to omit additional phases, keeping the upgrade process simple and easy to manage.
On the other hand, OMLayoutFeature introduces a more complex process. Actions are not embedded but dynamically registered and associated with different phases, like FIRST_RUN_ON_UPGRADE and ON_FINALIZE. The system scans for upgrade actions and links them to features based on annotations as different actions can be triggered at different stages of the upgrade, making it suitable for more complex upgrade scenarios.
In summary, Recon uses a simpler, more direct approach by omitting multiple upgrade phases, while OM provides flexibility through phased upgrades, allowing it to handle more intricate upgrade processes.
Should I shift the approach then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a plan to support prepare and finalize -- may not be part of this PR.
if no action, either you can define it as functional interface with empty lamda implementation for Version "0".
private static final ReconUpgradeAction NO_FINALIZE_ACTION = () -> {};
INITIAL_VERSION(0, "Initial Layout Version", NO_FINALIZE_ACTION),
This can be refactored later on based on required support.
* and associated upgrade action to be executed during an upgrade. | ||
*/ | ||
public enum ReconLayoutFeature { | ||
INITIAL_VERSION(0, "Initial Layout Version", new InitialLayoutUpgradeAction()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a plan to support prepare and finalize -- may not be part of this PR.
if no action, either you can define it as functional interface with empty lamda implementation for Version "0".
private static final ReconUpgradeAction NO_FINALIZE_ACTION = () -> {};
INITIAL_VERSION(0, "Initial Layout Version", NO_FINALIZE_ACTION),
This can be refactored later on based on required support.
* and associated upgrade action to be executed during an upgrade. | ||
*/ | ||
public enum ReconLayoutFeature { | ||
INITIAL_VERSION(0, "Initial Layout Version", new InitialLayoutUpgradeAction()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we handling avoid to re-create constraint with this change? I think that is a part of first version "0" as per this task. That change is missing.
…on-based dynamic upgrade actions
Testing out the Dynamic Upgrade action handling for Recon Schema VersioningFor testing purposes, I introduced another upgrade action type called When I started up Recon, I observed the following log outputs:
Currently, for Recon, we've decided that upgrade actions will be autofinalized. This means that only actions marked with the type This confirms that annotation-based upgrade action execution is working as expected, as only the appropriate actions (in this case, @sumitagrawl @devmadhuu please go through this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArafatKhan2198 Plz check few minor comments
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/upgrade/ReconUpgradeAction.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/upgrade/ReconUpgradeAction.java
Outdated
Show resolved
Hide resolved
...-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/ReconSchemaVersionTableManager.java
Outdated
Show resolved
Hide resolved
...one/recon/src/main/java/org/apache/hadoop/ozone/recon/upgrade/ReconLayoutVersionManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArafatKhan2198 , thanks for improving the patch. Few comments. Pls check.
public ReconLayoutVersionManager(ReconSchemaVersionTableManager schemaVersionTableManager) { | ||
this.schemaVersionTableManager = schemaVersionTableManager; | ||
this.currentMLV = determineMLV(); | ||
this.currentSLV = determineSLV(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need SLV as such here as member variable as this is not being used anywhere. SLV or feature version is only used to determine the max or higher feature versions greater than current MLV and actions of those only should be executed , So this logic can dynamically be executed without defining a member variable. As per current code SLV and MLV are both same and SLV as such defined as member variable not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the occurrence of SLV field
LOG.info("Feature {} finalized successfully.", feature.getVersion()); | ||
} | ||
} catch (Exception e) { | ||
LOG.error("Failed to finalize feature {}: {}", feature.getVersion(), e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since after introduction of this framework, features having schema update will be auto-finalized, and in case of any unexpected error or exception, this will silently be eaten up or logged without knowing of error if feature was actually upgraded or not and we as developers will think that feature will be released and upgraded, however this may be logged as silent failure, then its going to be difficult to know with lot of unpleasant surprises. Atleast we need to update this failure with proper details to ReconContext as well as should fail the Recon start, till we have alert mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will now be tracking failed updates in recon context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will now be tracking failed updates in recon context
I don't think this behavior is correct. Here we should fail to start Recon along with updating in ReconContext, else we'll never know what happened until user comes back and report us of any issues and our expectation is different.
…o handle INITIAL_VERSION in ReconLayoutVersionManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give few more comments
.orElse(-1); // Return -1 if no version is found | ||
} catch (Exception e) { | ||
LOG.error("Failed to fetch the current schema version.", e); | ||
return 0; // Return 0 if there is an exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may need throw exception if unable to read table. "0" may give wrong data.
LOG.info("Inserted new schema version '{}'.", newVersion); | ||
} | ||
} catch (Exception e) { | ||
LOG.error("Failed to update schema version to '{}'.", newVersion, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need throw exception if unable to update.
try { | ||
// If the feature is INITIAL_VERSION, skip executing any action and just update the schema version | ||
if (feature == ReconLayoutFeature.INITIAL_VERSION) { | ||
updateSchemaVersion(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should call action if any are there registered. If not registered, it will not come in loop. Do not need any special handling.
*/ | ||
private void updateSchemaVersion(int newVersion) { | ||
// Logic to update the MLV in the database | ||
this.currentMLV = newVersion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting currentMLV can be after db update. if DB have some exception, this flag is not required to be updated.
What changes were proposed in this pull request?
This PR introduces an upgrade framework in Recon that mirrors the pattern used in OMLayoutFeature with some key differences tailored for Recon's needs. The primary purpose of this PR is to manage and handle schema upgrades automatically during Recon startup, storing layout version information on disk and executing necessary upgrade actions.
Summary of Changes:
OMLayoutFeature
Pattern: Recon now follows a similar schema upgrade pattern as used in the OM layer, but withauto-finalization
. Unlike in OM, Recon will automatically finalize on restart without requiring manual intervention.MLV
) is stored in theRECON_SCHEMA_VERSION
table on disk. This table tracks the schema version to identify which features have been applied.SLV
) is computed dynamically based on the latest version defined inReconLayoutFeature
at runtime.ReconLayoutVersionManager
fetches the current MLV from theRECON_SCHEMA_VERSION
table.ReconLayoutFeature
.ReconLayoutVersionManager
: Handles the determination of the current MLV and SLV and performs the finalization of pending layout features.ReconSchemaVersionTableManager
: Manages interactions with the RECON_SCHEMA_VERSION table, including fetching and updating the current MLV.ReconLayoutFeature
: Enum defining the layout features with their version, description, and direct integration of upgrade actions.OMLayoutFeature
:What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11465
How was this patch tested?
For testing this PR, three example features were introduced into ReconLayoutFeature along with their respective upgrade scripts. These scripts were designed to simply log their execution, allowing us to verify that the upgrade framework works as intended. During Recon startup, the current Metadata Layout Version (
MLV
) was set lower than the Software Layout Version (SLV
). The logs confirmed that each feature's upgrade action executed sequentially, updating the MLV and recording the changes in theRECON_SCHEMA_VERSION
table. This successful test demonstrates that the framework correctly finalizes registered features whenMLV < SLV
.I'll remove them once the PR is approved by everyone.
Logs showcasing upgrade :-