Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy Ceph and Rook for disk storage #29

Open
11 tasks
celskeggs opened this issue Jul 1, 2017 · 16 comments
Open
11 tasks

Deploy Ceph and Rook for disk storage #29

celskeggs opened this issue Jul 1, 2017 · 16 comments

Comments

@celskeggs
Copy link
Member

celskeggs commented Jul 1, 2017

[Overview last updated 2020-02-13]

One massive part of our cluster, so far untouched, is the ability to allocate disk space from a storage cluster created from block storage directly attached to our physical nodes.

We plan to do this with Ceph and most likely Rook.

  • Repackage ceph and any dependencies and build them with Bazel.
  • Adjust our partitioning scheme to account for providing Ceph partitions.
  • Deploy Rook
  • Add integration tests for Rook
  • If Rook doesn't provide an appropriate encryption layer around Ceph, provide that using wireguard.
  • Confirm that we can use Ceph administration tools via Rook as necessary
  • Confirm that the object gateway works.
  • Confirm that block storage works.
  • Verify the ability for Ceph to be able to handle unexpected changes in disk topology. (i.e. "woops, this node just exploded.")
  • Plan out how Ceph can be regularly and automatically backed up.
  • Ensure that Ceph is fully integrated with our systems for cluster setup, expansion, and repair.
@celskeggs celskeggs added the macro label Jul 1, 2017
@celskeggs celskeggs modified the milestone: Dev Cluster 5 Jul 13, 2017
@Lchu613 Lchu613 self-assigned this Sep 9, 2017
@synergistics synergistics self-assigned this Sep 9, 2017
@celskeggs
Copy link
Member Author

@synergistics @Lchu613: You're now on the critical path, so:

  • What's the current status of this?
  • Can this be done by the end of DC5 (Oct 8)? If not, DC6 (Oct 22)?

@celskeggs
Copy link
Member Author

Since I haven't heard from either of you, this is getting deferred to DC6.

@celskeggs
Copy link
Member Author

As per separate conversations, I'm taking this.

@celskeggs celskeggs assigned celskeggs and unassigned synergistics and Lchu613 Oct 13, 2017
@celskeggs
Copy link
Member Author

Ceph appears to very much not support encryption over the wire, which means we either need an overlay network, patches to Ceph, or to encrypt everything going into it.

  1. Overlay networks will prevent use of block storage and CephFS outside of specially configured systems, which is bad.
  2. Patching Ceph seems infeasible, given how much they've tried to implement this.
  3. Encrypting everything going into Ceph with require using dm-crypt block overlays and object gateway server-side-encryption, and prevent use of CephFS.
  4. Switching to an alternative to Ceph appears to be blocked by a lack of good alternatives to Ceph.

The best option appears to be entry 3 on the above list, which means that the Ceph cluster itself will be deployed as authenticated but unencrypted, and consumers will have to provide the encryption layer.

@celskeggs
Copy link
Member Author

For key management, it seems like this will involve a limited set of predefined static keys, which need to be generated in the keysystem and distributed by it.

According to analysis of the ceph-authtool, the output format is reasonably easy to generate:

[client.admin]
	key = AQBRVwVavY4SLhAANoj1BVGSSN/VbYrV9PFTxA==

These base64-encoded 28 bytes are split up as (in hex): 0100ssssssssnnnnnnnn1000kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

In these, s is the time in seconds, n is the time in nanoseconds, and k is a randomly-generated 16-byte key.

It might be easier to generate this format ourselves rather than try to invoke ceph-authtool.

@cryslith
Copy link
Member

For documentation purposes: are those time fields little- or big-endian?

@celskeggs
Copy link
Member Author

celskeggs commented Nov 12, 2017

Little-endian.

Other information: having multiple keys in the same file is just a concatenation operation.

@celskeggs
Copy link
Member Author

celskeggs commented Nov 13, 2017

As it turns out, it is essentially impossible to actually make Ceph's authentication secure, due to how certain keys are replicated over non-confidential channels.

We're looking into using wireguard to solve this, which would act like a higher-quality and easier-to-use version of IPSec for our purposes.

If we do this, we have a few additional authentication-related tasks:

  • An integration between rkt and wireguard, so that a container can be assigned only a single network interface, provided by wireguard. This prevents Ceph from being able to talk to the world outside of the overlay network.
  • As previously required, an integration with the keysystem to support generation of Ceph keys for the administration key and the monitor key.
  • A mechanism to handle Ceph OSD key distribution, which might just be the builtin mechanisms plus some glue code and some work to make sure the keys are persisted properly.
  • A system wrapping wireguard that can manage the configuration for each endpoint, probably including distribution of public keys between endpoints, probably based on PKI handled by the keysystem.
  • An interstitial container within the overlay network that can run commands against the Ceph cluster on behalf of the admins.

Any suggestions as to how this could be simplified without losing any of the security properties?

@celskeggs
Copy link
Member Author

This is getting deferred to DC7 for the sake of focusing on improving developer engagement.

@celskeggs
Copy link
Member Author

A bunch of progress towards this in #274.

Current blocking point: the debian installer complains about not having a filesystem assigned to the second disk; I need to figure out which setting to use to auto-skip that warning.

Next up: ensure that a second partition is actually added, and try deploying the Ceph spec to a cluster.

@celskeggs
Copy link
Member Author

This was blocked at #274. I seem to remember that it almost got to the point of successfully bringing up the core of a Ceph cluster, but didn't quite get there.

@celskeggs
Copy link
Member Author

I'm back to working on this. I'm considering using https://rook.io/ as the Ceph operator, which should make things easier -- though it'll take a decent bit of evaluation to confirm that Rook is reasonable for our uses. The recent knowledge we gained through dealing with #337 should help us figure out how to manage disks correctly in the debian installer.

@celskeggs
Copy link
Member Author

celskeggs commented Dec 15, 2019

Ceph requires gcc 7 now, which means that we're blocked on upgrading the build chroot to Buster (#428), because Stretch doesn't have gcc-7.

@celskeggs
Copy link
Member Author

celskeggs commented Dec 17, 2019

We also ran into trouble making a filegroup out of the entirety of Ceph, since there were simply too many files, which meant we couldn't use rules_foreign_cc. bbaren suggests that this is a bug in Bazel, and that we should report it, because it should definitely allow huge filegroups -- but that means we'd have to be on the latest Bazel (#444).

@celskeggs
Copy link
Member Author

celskeggs commented Feb 14, 2020

Yeah, so... the problem with Ceph wasn't that it had ~60,000 files. The problem is that it had cyclic symlinks in the tarball! See bazelbuild/bazel#10783.

Since we're upgraded to Buster, we have the right version of GCC, and we should be able to make Ceph build under rules_foreign_cc now.

@celskeggs celskeggs changed the title Set up Ceph Deploy Ceph and Rook for disk storage Feb 14, 2020
@celskeggs
Copy link
Member Author

celskeggs commented Feb 15, 2020

I ran into bazel-contrib/rules_foreign_cc#239, which is theoretically solved by bazel-contrib/rules_foreign_cc#362, but can also be bypassed with this simple patch:

diff --git tools/build_defs/cmake_script.bzl tools/build_defs/cmake_script.bzl
index e1b0c13..393368a 100644
--- tools/build_defs/cmake_script.bzl
+++ tools/build_defs/cmake_script.bzl
@@ -65,7 +65,7 @@ def create_cmake_script(
     if not params.cache.get("CMAKE_RANLIB"):
         params.cache.update({"CMAKE_RANLIB": ""})
 
-    set_env_vars = " ".join([key + "=\"" + params.env[key] + "\"" for key in params.env])
+    set_env_vars = " ".join([key + "='" + params.env[key] + "'" for key in params.env])
     str_cmake_cache_entries = " ".join(["-D" + key + "=\"" + params.cache[key] + "\"" for key in params.cache])
     cmake_call = " ".join([
         set_env_vars,

It appears that this allows for Ceph to be built with cmake, but our current build times are on the order of three hours, because rules_foreign_cc doesn't inherently support parallelization (bazel-contrib/rules_foreign_cc#329), which means single-CPU builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants