Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PostgresAdmin directly for DB backups #160

Merged
merged 16 commits into from
May 18, 2021

Conversation

NickLaMuro
Copy link
Member

@NickLaMuro NickLaMuro commented Apr 5, 2021

Tested using https://github.com/NickLaMuro/manageiq-db_backup-tests to validate the changes.

There are a bunch of "deletes" that will follow this change, but this should not only be merged, but also released as a new version before those changes are merged. Otherwise the nightly builds of the appliance console will break.

TODO

  • Need to test against SMB/NFS
    • should work just like local, but currently have to rework connecting to the SMB/NFS shares prior to the tests running (as part of the test suite), and that currently doesn't exist simpler than expected, and there are just a few kinks to work out. Will post a update to the manageiq-db_backup-tests when it is available.
    • Update tests for rakeless DB admin NickLaMuro/manageiq-db_backup-tests#1
  • Determine if we want to keep file splitting
    • Probably not, but worth mentioning that this is a feature that will no longer be supported Answer: Nope, burn it.
  • Fix specs

Links

raise message
end

MiqRegion.replication_type = :none
Copy link
Member Author

@NickLaMuro NickLaMuro Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fryguy and/or @jrafanie So most of the methods in "green" here are pulled from lib/evm_database_ops.rb in ManageIQ/manageiq:

https://github.com/ManageIQ/manageiq/blob/4abf8c6ddd9f1985c1264ca407837423695b1dfd/lib/evm_database_ops.rb#L183-L216

and I was able to convert most of what was there previously (model or raw ActiveRecord) to straight Pg calls, but this one seems specific. After taking a look myself, I wasn't able to determine why this line specifically was added, as it seems that it will always be effectively a "no-op" when the method involved is called:

https://github.com/ManageIQ/manageiq/blob/61c77793f44a2453c0908997d4a8c5f79607faa3/app/models/miq_region.rb#L156-L165

But something tells me I am missing something. Can you enlighten me on what I am missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replication_type describes whether or not this database is the global side of replication or the remote side of replication. From a DB backup/restore point of view though I can't see why that would be important.

Copy link
Member

@Fryguy Fryguy Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think what's happening here is that it's temporarily disabling replication. Setting replication_type to none would essentially unregister the replication provider.

EDIT: But I agree - it's almost like the code is missing the "uninstall" bit? Maybe @carbonin remembers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug where restore couldn't drop the database because the replication backend was holding a connection open I think. There could have very well been more to it than that, but it should all bit in git history/bugzilla somewhere.

Also 👋

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@NickLaMuro NickLaMuro Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carbonin Okay, I think I see where my confusion was.

We pass into .replication_type= the desired_type, but the if current_type statements are getting the values from .replication_type, which will do some destroy/configure actions if needed, and I assume the methods called in .replication_type query against Postgres/PgLogical/stuffIVaguelyKnowAbout

Looks like I might need to look into what those do further, and see if we need to somehow replicate that logic over in the console.

Thanks for the response (and 👋 )

(oh, and yes, I did find that commit previously, but was misinterpretting the contents of the method. I now see the error in my ways)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh hai @carbonin !! 👋

result[0]["count"] > 0
end

private_class_method def self.disable_replication(dbname)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will handle all of the necessary concerns that were brought up previously:

https://github.com/ManageIQ/manageiq-appliance_console/pull/160/files/650c62b4dfc42ba519df50c8a8553b5aee848060#diff-4f5a9d143c19dfcadcdb0467ca432e5c0eba904c896862e8af14277978ec1479

I didn't delete the MiqRegion, as is currently done in PglogicalSubscription, but since this is a restore, that data will be blown away anyway, so we only need the parts that will be used prevent an error due to the HA code preventing a restore (which I think this is all of it... but I will double check).

I condensed a lot of what was in PglogicalSubscription to be a single method, so much of what is used here is spread out in the class between .find, .find_all, .delete, .safe_delete, and others. I also removed some of the column id conversion as I didn't think it was necessary, and just referenced it as sub_id for the purposes of the .drop_subscription portion of the code below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did test this on a local appliance using:

NickLaMuro/manageiq-db_backup-tests#1

And it worked just fine, so at least from a non-HA appliance setup, these new changes don't seem to break anything. Don't really know how to set up a HA appliance to test the scenarios that were present in https://bugzilla.redhat.com/show_bug.cgi?id=1444993 so that is still a unknown, but the base case at least still works.

@NickLaMuro NickLaMuro force-pushed the rakeless-db-admin branch 3 times, most recently from 38f7280 to c02d242 Compare April 14, 2021 14:58
@NickLaMuro NickLaMuro closed this Apr 14, 2021
@NickLaMuro NickLaMuro reopened this Apr 14, 2021
- smb
- s3
- ftp
- swift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

with_pg_connection do |conn|
pglogical = PG::LogicalReplication::Client.new(conn)

if pglogical.subscriber?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it makes sense, I'm ok with pushing down any of this logic into the pg/logical_replication gem. This could also be done in a follow-up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because of the delay I have had trying to get this PR, I am going to punt on doing this now and handle in a follow up.

So we don't loose track of it, I will open up an issue that can be triaged later, and we can determine if it makes sense then to push this into the gem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickLaMuro
Copy link
Member Author

So I have determined that there is a failure with the specs here that is based on the order that which they are run in... but have failed to track down what it is...

The [FIXUP] commits are currently some of those fixes, but I think one or two more will be necessary for me to track down this bug in the tests. Probably test specific related at this point, and just based on some of the changes I introduced into PostgresAdmin in this PR (which was setting up changes for #161 ).

@NickLaMuro NickLaMuro force-pushed the rakeless-db-admin branch 5 times, most recently from 49a6508 to c6a991c Compare April 27, 2021 19:51
Instead of calling a rake task in core, call `PostgresAdmin` which is
now part of this repository.  This takes many of the checks that exist
in `lib/evm_database_ops.rb` (core) and puts them in this class,
tweaking them slightly so they don't require things like `ActiveRecord`
to function, or classes that only exist in core
`(VmdbDatabaseConnection` for example)

In addition, this also doesn't include any of the `.with_file_storage`
calls as that will be phased out.  `PostgresAdmin` just needs a local
file to write to, and since `MiqFileStorage` used a FIFO to write data
to external sources (s3, swift, FTP), there is no change required there,
and this can just accept a local file and pass that to `PostgresAdmin`.
@NickLaMuro
Copy link
Member Author

Finally... it passed...

Now I just have to get the PR rebased... but that is a problem for "Monday Nick" to deal with...

The prepare logic should live in the PostgresAdmin class as it is
something that could be used for not only the console, but also CLI.

Also, fixed a few things that wouldn't have worked (integration testing
of `restore` hasn't been done with this code yet):

- Removed the MiqRegion.replication_type= call (not used, I don't think)
- Remove connection_pool disconnect (not relevant, since we aren't using
  ActiveRecord with this code base)
- Fixed .connection_count to handle things differently per backup type
  and dbname (this existed previously with VmdbDatabaseConnection, but
  was not properly ported over)

Everything else should be the same logically speaking, just updated for
working within the PostgresAdmin class itself.
Use PG::LogicalReplication::Client to remove subscriptions or
publications on a db prior to doing a database reload.

Doesn't drop the `MiqRegion`, as was previously done as part of
PglogicalSubscription in core:

  https://github.com/ManageIQ/manageiq/blob/8ca93da126aada556685db5d05aaf5e9a0666900/app/models/pglogical_subscription.rb#L70

But since this is a database reload, and the Region will be removed from
the DB anyway, this should not matter, and the necessary code required
to perform that delete will be pointless to include in this gem.
If an error occurs while connecting (for example, if you have the wrong
user), then this will fail with just that error, instead of an error
that exists in the ensure as well.
During a restore, when calling `PostgresAdmin.with_pg_connection`, use
the one defined in `PostgresRunner` since it includes connection
information for the secondary DB, instead of the one running default one
running on the user's machine.
Adds a helper method to PostgresRunner (.hard_reset), which is used to
reset the database to a fresh state.

Required for local tests only since the DB backup was taken off of an
appliance, so when that is restored first, it changes the default DB
roles to that `.initial_setup` create and is assumed by the class.
For the `.restore` specs, ensure the env is setup properly so the specs
can run properly.

With the new methods that check the existing database connections in
`PostgresAdmin` directly, this allows said connection to work properly.

For CI, we don't want to do this, since it doesn't actually setup the
ENV (bad method name, my bad...), but runs the restore in a task all by
itself.
Doesn't seem to be working currently (at least locally), and supporting
this isn't terribly necessary now that we won't be streaming files in
Ruby.

Might delete later...
Previously, the logic to pre-determine the backup type required
knowledge of the file storage type that PostgresAdmin did not have.  In
the case where the file storage data was being streamed (S3, FTP, etc.)
and was only read from the source once, you had to call out to the
`PostgresAdmin.pg_dump_file?` and `PostgresAdmin.base_backup_file?`
calls manually to determine what type file was going to be restored so
the proper code could be executed, and `PostgresAdmin` didn't have to
worry if the file data was being streamed or was just read off of a
disk/mount.

Since .validate_backup_file_type (not show in diff) is now part of
.restore, this extra case logic is no longer necessary, as `backup_type`
will always be set if not passed in.

As such, the code and spec for it has been updated to reflect that.
When not in a RSpec context (example: when we are doing database
operations in a separate process in CI), ensure that we have configured
a `ManageIQ::ApplianceConsole.logger` so things don't break in CI.
Using `>>` requires being at the proper elevated permission at the time
the command is executed, and since that isn't possible with these
scripts, it is simpler to use sed for this, and append a `sudo` to the
front.
This seems to cause issues, and postgres on Travis is setup to use
`trust` instead of `peer`.  Since the tests generally work with the
default, continue using that instead of using what is brought over from
a `.restore`.
@miq-bot
Copy link
Member

miq-bot commented May 17, 2021

Checked commits NickLaMuro/manageiq-appliance_console@b57d6f4~...bcb52d7 with ruby 2.6.3, rubocop 1.13.0, haml-lint 0.35.0, and yamllint
8 files checked, 4 offenses detected

lib/manageiq/appliance_console/postgres_admin.rb

spec/postgres_runner_helper.rb

@NickLaMuro NickLaMuro changed the title [WIP] Use PostgresAdmin directly for DB backups Use PostgresAdmin directly for DB backups May 17, 2021
@miq-bot miq-bot removed the wip label May 17, 2021
@NickLaMuro
Copy link
Member Author

@Fryguy I think this is finally ready for your review. Tests should be passing, and I have rebased/cleaned up the commits.

end

private_class_method def self.disable_replication(dbname)
require 'pg/logical_replication'
Copy link
Member

@Fryguy Fryguy May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to include this gem in the gemspec now? nvm - I searched the page and didn't find it and then suddenly I saw it 🤷‍♂️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you searched for it using the pg/..., then yeah, it wouldn't show up. But yes, it is required, but I think the specs wouldn't have passed if I didn't have it. 😉

context "with a pg_dump file from a pipe" do
let(:dbname) { "pg_dump_restore_of_simple_db_from_pipe" }
let(:fifo_path) { Pathname.new(Dir::Tmpname.create("") {}) }
# context "with a pg_dump file from a pipe" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not... it was failing, but there is no real reason that it should though.

But it isn't really needed, so it can be tossed. Will do that in a follow up.

Comment on lines +68 to +69
CiPostgresRunner.stop
CiPostgresRunner.start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super-minor, but you could add hard_reset method to CiPostgresRunner for consistency with PostgresRunner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that in a followup.

@Fryguy Fryguy merged commit c62dfe8 into ManageIQ:master May 18, 2021
@Fryguy
Copy link
Member

Fryguy commented May 18, 2021

Merged...everything else can be follow-up, if needed.

@Fryguy Fryguy self-assigned this May 18, 2021
bdunne added a commit that referenced this pull request Aug 4, 2021
Enhancements:
- Add database dump/backup/restore to CLI - #161
- Use PostgresAdmin directly for DB backups - #160

Bugs:
- Chown database.yml as manageiq:manageiq - #165
- Create /opt/kafka/config/keystore directory - #170
- Add missing parse_errors() method  - #169
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants