Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 storage backend is not sync-friendly #60

Closed
aldanor opened this issue May 2, 2018 · 2 comments
Closed

S3 storage backend is not sync-friendly #60

aldanor opened this issue May 2, 2018 · 2 comments
Assignees
Labels

Comments

@aldanor
Copy link

aldanor commented May 2, 2018

This took me a good while to figure out: I was moving my S3-backed Cronicle db from one S3 storage provider to another, and decided to sync the whole thing with rclone. Everything was seemingly fine, except for the fact that the UI was stuck in "waiting for a master" state. After a fair bit of digging and debugging, I figured that the sync was only partially complete, and looked like this in the destination bucket:

...
servers
server_groups
...

... and like this in the source bucket:

...
servers
servers/0
server_groups
server_groups/0

The core problem here being - that there's both a "folder" servers and a "file" servers (which apparently serves as a sort of a metadata header to the "folder"). Soon as the sync tool discovers the "file", it doesn't consider it a "folder" anymore and doesn't go digging any further, hence ignoring everything that looks like servers/*. I think this happened with s3cmd as well in the past, not just rclone, but I haven't given it any thought back then - so it might need additional checking. You could say it's an rclone problem (which it is, partially), but given that I couldn't find a single person running into the same issue with either rclone/s3cmd, it looks like people just don't name their s3 objects this way, so the cli tools wouldn't care to handle it either.

Another side of the same problem here would be trying to sync S3 tree to a local disk -- since you can't have a file and a folder with the same name, what would you expect it to do? (and this is most likely the reason sync tools behave the way they do, ignoring servers/0)

I understand that this might be too much of a change, and even if a workaround was implemented (like an optional suffix/prefix for the metadata object so it doesn't collide with the contents), some thought would have to be given to compatibility and migration questions. I'd be happy to discuss this though if it helps anything.

If allowing to change this convention happens to be an absolute no-no, maybe there should at least be a note somewhere saying "don't ever try to sync your cronicle s3 installation with any of the standard s3 command line tools like rclone, it will fail miserably (and worst of all, quietly)".

Thanks!

@jhuckaby
Copy link
Owner

jhuckaby commented May 4, 2018

Oh man, I am so sorry you ran into this issue, and that you had to spend so much time debugging it. Thank you for getting to the bottom of it tho. I had no idea that my storage key scheme would affect S3 in this horrible way. It really sucks that a key cannot be a substring of another. That blows.

I'm actually working on a brand new version of my pixl-server-storage module (which Cronicle uses for all file or S3 storage), which has a new feature where you can fully customize the keys by way of a "template" string. The original idea was to prepend some hash directories, for S3 performance reasons, but you could also use this for adding a suffix to all keys, which I think would effectively work around this issue. Here is a snippet from my updated docs:

S3 Key Template

Note that Amazon recommends adding a hash prefix to all your S3 keys, for performance reasons. To that end, if you specify a keyTemplate property, and it contains any hash marks (#), they will be dynamically replaced with characters from an MD5 hash of the key. So for example:

"keyTemplate": "##/##/[key]"

This would replace the 4 hash marks with the first 4 characters from the key's MD5, followed by the full key itself, e.g. a5/47/users/jhuckaby. Note that this all happens behind the scenes and transparently, so you never have to specify the prefix or hash characters when fetching keys.

Besides hash marks, the special macro [key] will be substituted with the full key, and [md5] will be substituted with a full MD5 hash of the key. These can be used anywhere in the template string.

Hi, I'm back. So for this particular S3 issue you found, you can basically force a suffix onto the end of every key, by setting the template config property to something like this:

"keyTemplate": "[key]/foo"

The [key] macro would be substituted with the actual key, and then /foo added onto the end. Of course, /foo has no meaning, and it could be anything, but it would fix the S3 directory clobbering issue, because then the list keys would end up like this:

...
servers/foo
servers/0/foo
server_groups/foo
server_groups/0/foo

In this case no key ends up being a substring of another.

The other piece needed here is a storage upgrade script, which I still have to write. This would accept two different storage configurations, and transfer all data from one implementation to another. This would allow you to "upgrade" to the new key template, for example. It would be Cronicle specific.

I don't have an official timeline on my new storage module release (it's version 2.0.0 and has a ton of new features and breaking changes, so I'm taking it nice and slow), but it should be soon, like maybe 2 months or less.

Thanks again for catching this, and reporting the issue.

@jhuckaby jhuckaby self-assigned this May 4, 2018
@jhuckaby jhuckaby added the bug label May 4, 2018
@jhuckaby
Copy link
Owner

Hey @aldanor,

This should be all fixed now, as of Cronicle v0.8.4. You can now include a special fileExtensions S3 property which adds .json file extensions to all JSON S3 keys, effectively avoiding the directory / file clashing issue. Please see the docs here:

https://github.com/jhuckaby/pixl-server-storage#s3-file-extensions

Note that since you already have a working Cronicle installation, you can't just enable the property after the fact. You will need to "migrate" your storage to a new location. It can simply be a new S3 key prefix, new S3 bucket, or AWS region. Full docs on migrating here:

https://github.com/jhuckaby/Cronicle#storage-migration-tool

Thanks again for reporting this issue!

- Joe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants