Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinstall packages during upgrade #717

Merged
merged 35 commits into from
Feb 27, 2024
Merged

Reinstall packages during upgrade #717

merged 35 commits into from
Feb 27, 2024

Conversation

jizhuoyu
Copy link
Collaborator

@jizhuoyu jizhuoyu commented Feb 21, 2024

In Vertica, upgrading the server version requires reinstalling packages because they are tied to a specific server version. This process was handled automatically during admintools deployments, but was not implemented for vclusterOps deployments. To address this issue, we have modified the upgrade process to include a package reinstallation step after restarting Vertica with the new version.

@jizhuoyu jizhuoyu marked this pull request as ready for review February 21, 2024 23:22
@jizhuoyu jizhuoyu self-assigned this Feb 21, 2024
pkg/controllers/vdb/installpackages_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/installpackages_reconciler.go Outdated Show resolved Hide resolved
pkg/vadmin/install_packages_vc.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/installpackages_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/installpackages_reconciler.go Outdated Show resolved Hide resolved
pkg/vadmin/install_packages_at.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/installpackages_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/installpackages_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/offlineupgrade_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/installpackages_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/offlineupgrade_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/installpackages_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vdb/installpackages_reconciler.go Outdated Show resolved Hide resolved
@spilchen spilchen changed the title Add install packages API Reinstall packages during upgrade Feb 22, 2024
Matt Spilchen and others added 13 commits February 22, 2024 09:40
@jizhuoyu
Copy link
Collaborator Author

As discussed, we observe e2e test error due to not enough disk space:

  • in VC implementation of install packages:
verticadb-operator verticadb-operator-manager-6c4bfdcfb4-br6hp manager 2024-02-22T14:03:01.583Z	INFO	controllers.VerticaDB.InstallPackages.HTTPSInstallPackagesOp	JSON response	{"verticadb": "kuttl-test-legal-maggot/v-upgrade-vertica", "reconcile-uuid": "d849aed9-549e-4068-b526-9c232a14abc3", "host": "10.244.0.36", "responseObj": {"packages":[{"package_name":"ComplexTypes","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704965_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"DelimitedExport","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704966_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"JsonExport","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704967_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"MachineLearning","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704968_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"OrcExport","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704969_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"ParquetExport","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704970_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"VFunctions","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704971_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"approximate","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704972_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"flextable","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704973_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"kafka","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704974_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"logsearch","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704975_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"place","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704976_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"txtindex","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704977_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."},{"package_name":"voltagesecure","install_status":"Exception: Could not write to [/data/Upgrade/v_upgrade_node0001_data/Sort_Temp_part_0_45035996273704978_3_0_0_0.dat]: Volume [/data/Upgrade/v_upgrade_node0001_data] has insufficient space."}]}}
  • in AT implementation of install packages:
verticadb-operator verticadb-operator-manager-6c4bfdcfb4-br6hp manager 2024-02-22T14:04:33.994Z INFO    controllers.VerticaDB   ExecInPod stream    {"verticadb": "kuttl-test-aware-wallaby/v-upgrade-vertica", "reconcile-uuid": "3e304ea9-c2a7-4a02-aceb-2ad82318751c", "pod": "kuttl-test-aware-wallaby/v-upgrade-vertica-pri-0", "err": "command terminated with exit code 1", "stdout": "Checking whether package approximate is already installed...\nInstalling package approximate...\n...Success!\nChecking whether package logsearch is already installed...\nInstalling package logsearch...\n...Success!\nChecking whether package DelimitedExport is already installed...\nInstalling package DelimitedExport...\nFailed to install package DelimitedExport\nChecking whether package VFunctions is already installed...\nInstalling package VFunctions...\nFailed to install package VFunctions\nChecking whether package flextable is already installed...\nInstalling package flextable...\n...Success!\nChecking whether package kafka is already installed...\nInstalling package kafka...\n...Success!\nChecking whether package voltagesecure is already installed...\nInstalling package voltagesecure...\n...Success!\nChecking whether package ParquetExport is already installed...\nInstalling package ParquetExport...\n...Success!\nChecking whether package OrcExport is already installed...\nInstalling package OrcExport...\n...Success!\nChecking whether package MachineLearning is already installed...\nInstalling package MachineLearning...\n...Success!\nChecking whether package ComplexTypes is already installed...\nInstalling package ComplexTypes...\n...Success!\nChecking whether package place is already installed...\nInstalling package place...\nFailed to install package place\nChecking whether package txtindex is already installed...\nInstalling package txtindex...\nFailed to install package txtindex\nChecking whether package JsonExport is already installed...\nInstalling package JsonExport...\nFailed to install package JsonExport\n", "stderr": ""}

To try addressing this issue, we are removing the test steps to download 23.4 image and upgrade to 23.4 from v12 image. Notice that this implies that we will not test the AT implementation of install packages in e2e test.

pkg/controllers/vdb/onlineupgrade_reconciler.go Outdated Show resolved Hide resolved
pkg/vadmin/interface.go Outdated Show resolved Hide resolved
@spilchen
Copy link
Collaborator

I saw that the new e2e test still fails because there isn't enough disk space. So, I started to look at the disk space usage of a database.

  1. New databases created without package install: communal 8K, each node is 11MB
  2. New databases created with packages installed: communal: 162MB, each node is 335MB.
  3. When upgrading 2., the disk space usage after the upgrade is: communal 323MB, each node is 394MB. Although I did see the per node usage spike to 576MB before it went down.

Here is my suggestion for an attempt at fixing it:

  • change the e2e tests to use the initPolicy of CreateSkipPackageInstall and remove the package verification steps. This will keep the disk requirements low.
  • keep running the tests serially
  • add a 2 new test to the same leg that will verify package install through upgrade. One for online and one for offline upgrade. The only difference is that this test will only be for a single node. We probably only need to do 1 upgrade here to verify things are working.

@jizhuoyu
Copy link
Collaborator Author

I saw that the new e2e test still fails because there isn't enough disk space. So, I started to look at the disk space usage of a database.

  1. New databases created without package install: communal 8K, each node is 11MB
  2. New databases created with packages installed: communal: 162MB, each node is 335MB.
  3. When upgrading 2., the disk space usage after the upgrade is: communal 323MB, each node is 394MB. Although I did see the per node usage spike to 576MB before it went down.

Here is my suggestion for an attempt at fixing it:

  • change the e2e tests to use the initPolicy of CreateSkipPackageInstall and remove the package verification steps. This will keep the disk requirements low.
  • keep running the tests serially
  • add a 2 new test to the same leg that will verify package install through upgrade. One for online and one for offline upgrade. The only difference is that this test will only be for a single node. We probably only need to do 1 upgrade here to verify things are working.

I tested in one commit where the install packages steps are still kept in the 2 original upgrade tests. The results of all 4 tests are as follows:

  1. offline install and upgrade once: 23.4->24.1 failed
  2. online install and upgrade once: 23.4->24.1 failed
  3. online install and upgrade 3 times: 12.0.4->23.4 succeeded, 23.4->24.1 failed, 24.1->latest no chance to run
  4. offline install and upgrade 3 times: package install verification right after first createdb for 12.0.4 failed, all 3 upgrades later have no chance to run

now that with the latest commits we essentially have 2 tests only for upgrade (3 times) and 2 tests for upgrade and install (once from 23.4->24.1). we passed the former 2 tests and failed the latter 2.

I guess maybe we could pass if we upgrade and install from 12.0.4->23.4 rather than 23.4->24.1 for the latter 2 tests, however this means that we are testing install for admintools only.

@spilchen
Copy link
Collaborator

Thanks for trying these experiments. It doesn't look like we'll be able to automate your tests on account of the disk space constraint. Manual verification will have to do for now. Can you remove those two new tests you added? We can add back the parallelism to leg 8 as well. Can we get the other tests in e2e leg 8 back to what they were before. I think you removed one of the upgrade versions.

@jizhuoyu
Copy link
Collaborator Author

jizhuoyu commented Feb 27, 2024

Thanks for trying these experiments. It doesn't look like we'll be able to automate your tests on account of the disk space constraint. Manual verification will have to do for now. Can you remove those two new tests you added? We can add back the parallelism to leg 8 as well. Can we get the other tests in e2e leg 8 back to what they were before. I think you removed one of the upgrade versions.

As discussed, leg 8 now has 2 tests remaining (upgrade 3 times from 12.0.4 to 23.4 to 24.1 to latest) for both online and offline upgrade where we have CreateSkipPackageInstall as the initPolicy. I added several steps waiting for condition=UpgradeInProgress=False as I think it's good to confirm that we actually pass the last step of an upgrade. Besides, comments are added in setup-vdb.yaml to clearly state the reason why we are skipping package install. @spilchen

Copy link
Collaborator

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Thanks for doing so many revisions.

@jizhuoyu jizhuoyu merged commit ac8a8ca into main Feb 27, 2024
30 checks passed
@jizhuoyu jizhuoyu deleted the zji/install-packages-api branch February 27, 2024 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants