Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of a ghost backup and potentially ghost restore command. #468

Closed
1 of 2 tasks
braunsonm opened this issue Aug 30, 2017 · 17 comments
Closed
1 of 2 tasks

Addition of a ghost backup and potentially ghost restore command. #468

braunsonm opened this issue Aug 30, 2017 · 17 comments

Comments

@braunsonm
Copy link

braunsonm commented Aug 30, 2017

This issue is a

  • Bug Report
  • Feature Request

Summary

Addition of a ghost backup command which gathers a dump of the MySQL database as well as a tar of the content folder would be a great addition. Users could add it to a cron task and backup the files to whatever 3rd party they wish.

I discussed this a bit on the slack with Austin and we think it would be a useful addition.

@acburdine
Copy link
Member

Braindump:
These commands would be useful - might also be worth adding some extension hooks to define where exactly to send the backups. The default implementation in core would be just saving it to the local filesystem, other extensions could potentially save to s3, dropbox, etc.

@acburdine acburdine changed the title ghost backup Addition of a ghost backup and potentially ghost restore command. Aug 30, 2017
@vikaspotluri123
Copy link
Member

vikaspotluri123 commented Aug 30, 2017

@acburdine can you assign me to this please 😄

@ErisDS
Copy link
Member

ErisDS commented Dec 1, 2017

cc @sebgie & @kirrg001

I know there's a lot of active work going on here, which I'm super excited about! I wanted to flag up that we had a chat about this yesterday in a wider meeting about backups where we ended up talking about whether Ghost should have a backup mechanism.

It made me realise that there are a couple of different things that ghost backup could possibly do.

Where we are:

The original spec on this issue is for a mysqldump + a tar of the content folder. This sounds a lot like what we were discussing yesterday. Meanwhile, it seems the implementation is doing an export of the database to JSON and then a zip of the content folder. These small differences are, I think quite important.

Also in both the spec & the implementation, the job of ghost backup is to create the backup file - whatever that is, and then it would be the job of someone adding this to a cron to determine where the backup goes.


Where I'd like to be:

I think it is correct that ghost backup should only prepare the backup, and that this should then be usable in configuring a cron to run the command & move the backup file to a 3rd party. However, I think this scenario should be fully tested before shipping the feature, to make sure that it is compatible with being used that way (E.g. with AWS s3 and/or DO block storage) - as this is definitely the most likely use case. I'd also like to end up with how to do this clearly documented.

In terms of what the backup does, I think it would be wrong to overlook using mysqldump instead of, or as-well-as doing a JSON backup. It may be that this needs to be coded so it's a possible future extension?

I'm also not 100% on the method currently being used to generate the JSON export. Currently the exporter is being required, which seems odd when we have a backup mechanism in Ghost which could either be required or used over the API. There is some code duplication I'd like to avoid, and it's fine IMO if that needs changes to Ghost itself.

In terms of the file-format, I'm not sure that using .zip instead of .tar is a trivial difference. I am not 100% on what the differences are, but there's got to be a reason why A. the original request was for .tar and my gut feeling is .tar is normal 🤔 I have a feeling there's a difference in terms of permissions, and also tooling - I believe tar is available on most systems by default and so we could shell out? This needs a bit of research and an explicit decision.


Bottom line here:

  • there are a couple of details that need to be clearly "designed" and decided upon, i.e. what actually happens as part of the backup and what the resulting file name & extension is.
  • we need to consider how the Ghost-CLI and Ghost code interacts, most importantly, it's definitely OK to consider making upstream changes to Ghost for this

Sorry for the wall of text. I am appreciative of all the work that is already going on here to make this a reality!

@sebgie
Copy link
Contributor

sebgie commented Dec 1, 2017

In terms of the file-format, I'm not sure that using .zip instead of .tar is a trivial difference.

From: https://itsfoss.com/tar-vs-zip-vs-gz/

All that being said, in the Unix-like world, I would still favor tar archive type because the zip file format does not support all the Unix file system metadata reliably. For some concrete explanations of that last statement, you must know the ZIP file format only defines a small set of mandatory file attributes to store for each entry: filename, modification date, permissions. Beyond those basic attributes, an archiver may store additional metadata in the so-called extra field of the ZIP header. But, as extra fields are implementation-defined, there are no guarantees even for compliant archivers to store or retrieve the same set of metadata.

  • .tar == uncompressed archive file
  • .zip == (usually) compressed archive file
  • .gz == file (archive or not) compressed using gzip

We are probably looking for .tar.gz?

@vikaspotluri123
Copy link
Member

vikaspotluri123 commented Dec 2, 2017

Alright, I'm free! 😄

Something I'd like to reiterate (although based on what I've seen over the past however-long I've been active with ghost I don't think it's necessary) is I'm super open to change. I'm no expert in ghost, and I'm fairly new here, so I might be not be following your standards 100% 😆

Sidenote: Work on the PR related to this started around 3 months ago, so I don't fully remember the reasoning for doing some of the things I did.

  • How content is archived

    • I have no issue swapping the zip archive to tar (+gz) if needed. I think the reason I ended up using zip in the PR is because it's the only truly universal archive that I am aware of - it's natively supported in Windows, Linux and OSX. However, just because something is universally available doesn't mean it's the best tool for the job. Based on @sebgie's comment, I think archiving with tar and compressing with gzip is a good idea.
    • On that note, I think the minimal mandatory attributes required to zip a file actually might be an influencer into an issue I noticed - outlined in the dependency (adm-zip) issue tracker. I ended up (with the guidance of @acburdine) implementing a directory walk which works, but I prefer native functionality.
    • I don't think we should abstract archiving functionality to the OS, because it's an extra requirement (see Git isn't documented as a prerequisite #524), especially for people who run Ghost on Windows in production. That being said, npm maintains tar and they claim it's fast (although fast is a relative term).
    • From what I can tell, the current implementation does everything in-memory, until the archive is filled and the save function is explicitly called. Given Ghost's current lack of media management (not hating or anything 😋) and the fact that some people use sqlite, the content folder has the potential to get really big. This might not be a problem for many users, but it could definitely lead to memory issues for others. Thus, no matter what we end up doing, something needs to be done about it - either now, or in the future.
  • Backing up the database

    • The current implementation basically does the same thing as exporting from labs. When I first started implementing the command, I wasn't aware of any spec, so I did what I thought was best. My goal was to write as little redundant code as possible - Ghost core already did something related to exporting key data, so I figured it would be better to piggyback off of that code than for me to write something that has greater potential to introduce bugs. Also, it conveniently selects the proper database to use (mysql vs sqlite3) which was an added bonus.

    • I'm also not 100% on the method currently being used to generate the JSON export. Currently the exporter is being required, which seems odd when we have a backup mechanism in Ghost which could either be required or used over the API. There is some code duplication I'd like to avoid, and it's fine IMO if that needs changes to Ghost itself.

      I basically followed the code the export API route implements and found the exporter, which I then required in the backup command. I wasn't aware there's other backup capabilities in ghost. My goal was to require the backup mechanism 😂 With regards to code duplication, I tried to keep it minimal - the actual export call is pretty basic. Of course, I haven't been involved with Ghost for too long, so I could definitely be doing it a weird way :)

  • Backup location

    • @ErisDS I'm not fully sure I understand what you were saying about ghost backup only preparing the backup. For context, as of now it supports the --output argument to specify the directory to save the zip file archive to. @acburdine mentioned the possibility of adding extensibility to this command in the future, to do things like backup to D.O block storage, AWS s3, or the next big thing. Do you think this is too much?
  • Bottom Line

    • Taking a step back and looking at the drawing board is perfectly fine with me!

Sorry for the wall of text [...]

oops 😁

@kirrg001
Copy link
Contributor

kirrg001 commented Dec 4, 2017

There are two disadvantages i can see using Ghost's exporter.

  1. Currently Ghost's exporter does not export clients and tokens, see Export clients & client_trusted_domains Ghost#8719. That means it needs to be documented for the backup command. Furthermore, importing a backup as JSON via the admin client regenerates the passwords - you won't be able to login into you blog, so it's required to have mail setup. If the CLI takes over importing the content, it can be skipped with importPersistUser.

  2. In general, requiring a JS file from Ghost's file tree, ties Ghost and the CLI strongly together, but outsourcing the exporter to an NPM package means we need to take care of compatibility changes with Ghost. The only other option i can see here is to make Ghost being able to export the JSON frequently using a configuration. So that the CLI is only responsible to zip the files and it provides the option to run a cron to upload the data somewhere. Just a thought.

It may be that this needs to be coded so it's a possible future extension?

Having mysqldump as default backup strategy is a little tricky, because if you are using a remote database, this won't work, but there is a JS implementation available. Not sure if this package works good.

I think the dump strategy is in general more user friendly, because A you won't loose any data (clients, tokens..) and B the CLI could offer a command to insert a given dump.

We are probably looking for .tar.gz?

Yeah that would make sense, because of the gzip compression. The CLI requires Ubuntu 16.04 - tar is always preinstalled.

the fact that some people use sqlite, the content folder has the potential to get really big

Yeah that is true. I think it makes sense to offer the option to exclude the backup of images.

@vikaspotluri123
Copy link
Member

I'm not too familiar w/ database exports, so I can't say much - all I can say is I just realized w/ sqlite3, there's no need to export the database, since we can just add the db file to the archive

Yeah that would make sense, because of the gzip compression. The CLI requires Ubuntu 16.04 - tar is always preinstalled.

I know the same assumption was made for git (#524 - the user said they were running Ubuntu 16.04) - however, tar is used by many more applications than git. If we end up using shell tar, do you think it should be documented somewhere?

Yeah that is true. I think it makes sense to offer the option to exclude the backup of images.

What about giving them the option to skip / force everything? Yargs already supports the --[no-]do-something flag which we could use to determine exactly what to back up.

@kirrg001 kirrg001 added the later label Jan 12, 2018
@vikaspotluri123
Copy link
Member

I might be beating a dead horse, but I just found out tar is coming to Windows so can definitely get away with using shell tar

@Bronek
Copy link

Bronek commented Jan 30, 2018

Hi there, just wanted to +1 for this feature :) Thank you for your hard work!

@vikaspotluri123
Copy link
Member

@kirrg001 @acburdine @ErisDS I wrote this spec a while ago, thoughts?

The goal of the backup command is to create an archive of all non-standard ghost data (user-created content) in a method which allows the data to be imported into a fresh installation at a later time.

The backup command has 5 main content areas which need to be archived

  • Images
  • Logs
  • Database
  • Configuration (config.*.json, .ghost-cli)
  • Extensions

All of which will be saved in a .tar.gz archive (tar is the archive mechanism, gz is the compression mechanism)

The initial implementation of the command will rely on the following preconditions

  • tar is installed on the system (this will be added as a dependency in the docs)
  • enough volatile memory is available
  • enough nonvolatile memory is available
  • does not rely on the state of the instance
  • mysqldump is available if needed

Images and logs will directly be cloned into the /content folder of the archive.

Sqlite databases will be cloned verbatim into the archive. Mysql databases will be exported using mysqldump. Both will be stored in the /content folder of the archive.

Configuration files will copied into the root of the archive

Extensions will be hooked via the backup method and return a JSON object of tasks that need to be executed (for example, the nginx extension will return whether or not nginx was setup, and whether ssl was setup) which can be processed in a future restore hook

@Securitybits-io
Copy link

Would just like to 👍 this feature, to be able to automatically backup and restore the contents is key!

@dm17
Copy link

dm17 commented Jul 22, 2019

I don't get why #605 was closed. Is there a ghost-cli import feature? If not, then how do you import an old database when launching a new ghost instance programmatically?
The process should be like:

  1. take new ghost image
  2. boot image
  3. import content

Right now one needs to register an account just to use their local instance's import tool :(

@kevinansfield
Copy link
Member

@dm17 it was closed because it was a duplicate of this issue that you are commenting on. If you need export/import then you'll need to use Ghost's admin interface or the API (import/export will need session auth because they are not exposed to 3rd party integrations for security).

@dm17
Copy link

dm17 commented Jul 23, 2019

@dm17 it was closed because it was a duplicate of this issue that you are commenting on. If you need export/import then you'll need to use Ghost's admin interface or the API (import/export will need session auth because they are not exposed to 3rd party integrations for security).

Can you add to the Ghost docs exactly which data importing/exporting json files in the Ghost admin interface will include/exclude? It says "posts," and also - based on testing - I was able to figure out it does not import API keys for integrations... But what else is missing? It is a lot of data to go through so good documentation would save a lot of time.

acburdine added a commit to acburdine/Ghost-CLI that referenced this issue Nov 6, 2019
refs TryGhost#468
- add `--from-export` argument to install command
- add import command to import a ghost export file into an existing
instance
- add import parsing/loading code
acburdine added a commit that referenced this issue Nov 6, 2019
refs #468
- add `--from-export` argument to install command
- add import command to import a ghost export file into an existing
instance
- add import parsing/loading code
acburdine added a commit to acburdine/Ghost-CLI that referenced this issue Nov 7, 2019
refs TryGhost#468
- add export command & export taks
acburdine added a commit that referenced this issue Nov 7, 2019
refs #468
- add export command & export taks
@dsecareanu
Copy link

Migrating my Ghost blogs from one server to another these days I've noticed that there are quite a lot of .json files in the content/fata folder, which contain the website content. Are these saved when someone does the export in the admin interface or are done programmatically?

This could be an option to explore for the backup utility, i.e., every time a new post/page is added (or modified) or periodically (probably a better option) the site content is exported into this .json file.

Also, the update command should probably perform a backup in the background as it does work to restore the files, but is unclear what it does in terms of website content/database.

The other elements (not mentioned above by @vikaspotluri123) that need to be backed up:

  • content/data/redirects.json
  • content/settings/routes.yaml

Also, based on my recent experience, highly important to check on existing storage (nonvolatile memory).

And another suggestion: backups could be setup in config.production.json (i.e. backup toggle yes/no, backup interval daily, weekly, monthly, etc.).

@github-actions
Copy link

github-actions bot commented May 3, 2021

Our bot has automatically marked this issue as stale because there has not been any activity here in some time. The issue will be closed soon if there are no further updates, however we ask that you do not post comments to keep the issue open if you are not actively working on a PR. We keep the issue list minimal so we can keep focus on the most pressing issues. Closed issues can always be reopened if a new contributor is found. Thank you for understanding 🙂

@yamsellem
Copy link

@kevinansfield may you please add a pointer to own to import/export with the API? I don't find any documentation on the matter. Plus the ghost-cli has an import/export option, but it's not documented either.

@vikaspotluri123 is the backup done / postponned to a future release / dropped?

Thanks a lot to both of you.

ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- Only show the warning about upgrading if there's a major upgrade to do
ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- this ensures that we export a members.csv file if the endpoint exists
ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- the old streams wiring didn't handle the 404 error
- my previous attempt at changing the stream code  hung on success instead
- this is more modern code, but works on node 12 for both the success and failure case
ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- if the backup command is run multiple times, and the latter runs fail to export, the backup files would be old
- this ensures that the backup files all belong together
- also renames backup to content
ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- this ensures that we export a members.csv file if the endpoint exists
ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- the old streams wiring didn't handle the 404 error
- my previous attempt at changing the stream code  hung on success instead
- this is more modern code, but works on node 12 for both the success and failure case
ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- if the backup command is run multiple times, and the latter runs fail to export, the backup files would be old
- this ensures that the backup files all belong together
- also renames backup to content
ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- this ensures that we export a members.csv file if the endpoint exists
ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- the old streams wiring didn't handle the 404 error
- my previous attempt at changing the stream code  hung on success instead
- this is more modern code, but works on node 12 for both the success and failure case
ErisDS added a commit to ErisDS/Ghost-CLI that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: TryGhost#468

- if the backup command is run multiple times, and the latter runs fail to export, the backup files would be old
- this ensures that the backup files all belong together
- also renames backup to content
ErisDS added a commit that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: #468

- Only show the warning about upgrading if there's a major upgrade to do
ErisDS added a commit that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: #468

- this ensures that we export a members.csv file if the endpoint exists
ErisDS added a commit that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: #468

- the old streams wiring didn't handle the 404 error
- my previous attempt at changing the stream code  hung on success instead
- this is more modern code, but works on node 12 for both the success and failure case
ErisDS added a commit that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: #468

- if the backup command is run multiple times, and the latter runs fail to export, the backup files would be old
- this ensures that the backup files all belong together
- also renames backup to content
ErisDS added a commit that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: #468

- we have to handle files differently depending on whether we're working locally or on a server
- first pass worked locally, second worked on ubuntu, this one should work on both
ErisDS added a commit that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: #468

- zip is meant to be installed on ubuntu 18 but for some reason on DO it is not
- use our JS lib instead, as that will work on any platform, although it may have issues with scaling
ErisDS added a commit that referenced this issue May 19, 2022
refs: TryGhost/Toolbox#334
refs: #468

- in Ghost CLI casper is always a symlink
- we don't need to back it up, it will always be present and updated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests