-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy Ceph and Rook for disk storage #29
Comments
@synergistics @Lchu613: You're now on the critical path, so:
|
Since I haven't heard from either of you, this is getting deferred to DC6. |
As per separate conversations, I'm taking this. |
Ceph appears to very much not support encryption over the wire, which means we either need an overlay network, patches to Ceph, or to encrypt everything going into it.
The best option appears to be entry 3 on the above list, which means that the Ceph cluster itself will be deployed as authenticated but unencrypted, and consumers will have to provide the encryption layer. |
For key management, it seems like this will involve a limited set of predefined static keys, which need to be generated in the keysystem and distributed by it. According to analysis of the ceph-authtool, the output format is reasonably easy to generate:
These base64-encoded 28 bytes are split up as (in hex): In these, It might be easier to generate this format ourselves rather than try to invoke ceph-authtool. |
For documentation purposes: are those time fields little- or big-endian? |
Little-endian. Other information: having multiple keys in the same file is just a concatenation operation. |
As it turns out, it is essentially impossible to actually make Ceph's authentication secure, due to how certain keys are replicated over non-confidential channels. We're looking into using wireguard to solve this, which would act like a higher-quality and easier-to-use version of IPSec for our purposes. If we do this, we have a few additional authentication-related tasks:
Any suggestions as to how this could be simplified without losing any of the security properties? |
This is getting deferred to DC7 for the sake of focusing on improving developer engagement. |
A bunch of progress towards this in #274. Current blocking point: the debian installer complains about not having a filesystem assigned to the second disk; I need to figure out which setting to use to auto-skip that warning. Next up: ensure that a second partition is actually added, and try deploying the Ceph spec to a cluster. |
This was blocked at #274. I seem to remember that it almost got to the point of successfully bringing up the core of a Ceph cluster, but didn't quite get there. |
I'm back to working on this. I'm considering using https://rook.io/ as the Ceph operator, which should make things easier -- though it'll take a decent bit of evaluation to confirm that Rook is reasonable for our uses. The recent knowledge we gained through dealing with #337 should help us figure out how to manage disks correctly in the debian installer. |
Ceph requires gcc 7 now, which means that we're blocked on upgrading the build chroot to Buster (#428), because Stretch doesn't have gcc-7. |
We also ran into trouble making a filegroup out of the entirety of Ceph, since there were simply too many files, which meant we couldn't use rules_foreign_cc. bbaren suggests that this is a bug in Bazel, and that we should report it, because it should definitely allow huge filegroups -- but that means we'd have to be on the latest Bazel (#444). |
Yeah, so... the problem with Ceph wasn't that it had ~60,000 files. The problem is that it had cyclic symlinks in the tarball! See bazelbuild/bazel#10783. Since we're upgraded to Buster, we have the right version of GCC, and we should be able to make Ceph build under rules_foreign_cc now. |
I ran into bazel-contrib/rules_foreign_cc#239, which is theoretically solved by bazel-contrib/rules_foreign_cc#362, but can also be bypassed with this simple patch:
It appears that this allows for Ceph to be built with cmake, but our current build times are on the order of three hours, because rules_foreign_cc doesn't inherently support parallelization (bazel-contrib/rules_foreign_cc#329), which means single-CPU builds. |
[Overview last updated 2020-02-13]
One massive part of our cluster, so far untouched, is the ability to allocate disk space from a storage cluster created from block storage directly attached to our physical nodes.
We plan to do this with Ceph and most likely Rook.
The text was updated successfully, but these errors were encountered: