Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Migrations V2] interactive migrations #100685

Closed
Bamieh opened this issue May 26, 2021 · 9 comments
Closed

[Migrations V2] interactive migrations #100685

Bamieh opened this issue May 26, 2021 · 9 comments
Labels
enhancement New value added to drive a business result Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@Bamieh
Copy link
Member

Bamieh commented May 26, 2021

It would be great if we can create a dedicated CLI command (bin/kibana-migrations) for migrations before running bin/kibana start.

A dedicated CLI command with an -i interactive mode would allow users to:

  1. ask if they wish to drop the failing objects.
  2. Roll back to previous index on migrations failure.
  3. specify which SO types to migrate or skip.
@Bamieh Bamieh added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc enhancement New value added to drive a business result labels May 26, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pgayvallet
Copy link
Contributor

I remember this was something we initially wanted to have for migv2.

Rudolf spent some time checking how this could be achieved. However, to run the migration, we effectively need to discover and run at least the setup phase of the plugins, to let them register their types, meaning that a significant part of core needs to be run.

Without duplicating a lot of things, this leaded to a lot of if migrationDryRun checks in core's code base, e.g to directly exit after the migration, similar to the if devCliChild checks we removed recently when extracting the cliDevMode out of core (and we didn't even tried to add an interactive mode here).

Also, note that If such cli command would definitely be a great addition, it would not be usable on cloud, as the customers wouldn't be able to use it, which is a significant con.

@pgayvallet
Copy link
Contributor

FWIW, found the issue: #55404 and the PR #58470

@Bamieh
Copy link
Member Author

Bamieh commented Jun 2, 2021

Also, note that If such cli command would definitely be a great addition, it would not be usable on cloud, as the customers wouldn't be able to use it, which is a significant con.

I agree. also it is not the best user experience dealing with terminals.


After giving it a second thought i believe there is a way to achieve this via creating an interactive migrations UI when running kibana under a certain flag (or automatically once an upgrade is detected).

  1. run kibana under a specific flag. This fully starts kibana but prompts migrations rather than other pages (after the login page, similar to the "add data / explore on my own" page basically.)
  2. core would create a new empty .kibana index and point the alias to the new index. This means that the upgraded kibana under this flag will temporary look like a fresh installment.
  3. Users would use the interactive migrations to migrate their saved objects via UI.
  4. if migrations fail or gets cancelled we'd delete the new index and repoint the alias to the previous one.
  5. On success we can restart kibana or just unblock the routes / other pages.

interactive UI would allow users to run dry migrations, pause, decide what to do with failed/quarantined objects, or even select which registries to skip if allowed by the plugin.

@pgayvallet
Copy link
Contributor

This fully starts kibana but prompts migrations rather than other pages

This feels like a feature that could maybe benefit from the setup mode: #99318

Although setup mode is not planned to run all the plugins during the preliminary setup stage, so maybe not.

@Bamieh
Copy link
Member Author

Bamieh commented Jun 3, 2021

I've added a discussion item to our sync to see if its worth investigating this effort more. Yea it seems setup mode might be a good starting point to do this

@joshdover joshdover added the project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient label Jun 16, 2021
@pgayvallet pgayvallet changed the title [Migrations V2] dedicated CLI command for migrations [Migrations V2] interactive migrations Jun 29, 2021
@kobelb
Copy link
Contributor

kobelb commented Jul 14, 2021

We'll want to approach an interactive UI for upgrades cautiously. Users and automation have been conditioned to believe that once Kibana starts up, that an upgrade is complete. If we change this dynamic, and now Kibana must startup and then an administrator must log in to Kibana and use a UI to complete the migration, we change this dynamic. This could lead to users having new instances of Kibana running, that aren't fully upgraded because migrations haven't completed, preventing normal users from using Kibana until an admin comes around and clicks some buttons.

@rudolf
Copy link
Contributor

rudolf commented Sep 17, 2021

Our users (and support) are currently following a process somewhat like:

  1. Upgrade Kibana
  2. Migration fails causing downtime
  3. Fix root cause of migration failure while Kibana remains down
  4. Retry upgrade
  5. Kibana is available again

A better process would be:

  1. Upgrade Kibana
  2. Migration fails causing downtime
  3. Rollback to the previous version so that there's minimal downtime/business disruption
  4. Take your time to understand and fix the root cause before trying again

In the current process, having a UI that could speed up fixing issues would help reduce downtime (in some scenarios, see below). However, the "better process" reduces downtime in all possible failure scenarios.

A UI would only be able to resolve the following kinds of problems:

  1. corrupt saved objects
  2. unknown saved objects
  3. a bug in a plugin's migration transformation function that only affects a small number of saved objects

These are the same problems we're able to detect using dry run migrations #55404. If a dry run fails we won't add any write blocks to the existing indices, so all that a user has to do to rollback is to stop the new Kibana and start up the old Kibana again.

Assuming our logs and documentation have clear instructions, users should be able to resolve all the causes of the migration failure by fixing or deleting problematic docs.

So although a UI would make it a lot easier to take corrective action I don't think it will have a big impact on the downtime users experience. Given the technical complexity of making this work I don't think we'll get enough gains from this feature. I think we should first focus our efforts on streamlining rollbacks with dry runs.

@kobelb
Copy link
Contributor

kobelb commented Sep 20, 2021

So although a UI would make it a lot easier to take corrective action I don't think it will have a big impact on the downtime users experience. Given the technical complexity of making this work I don't think we'll get enough gains from this feature. I think we should first focus our efforts on streamlining rollbacks with dry runs.

Agreed! It also sets us up better for a continuous delivery world where we are always updating Kibana automatically. If we allow partial upgrades, then a lot of users are going to be blocked from using Kibana until some "administrator" comes around and finishes the upgrade.

@rudolf rudolf closed this as completed Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

6 participants