-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Membership Module
Membership module is introduced as of 08/01/2023 targeting to replace worker-registration on master. Membership module provides capability to either
- use a static file to provide a pre-set of worker list for a alluxio cluster
- use etcd cluster as a distributed system membership coordinator
MembershipManager
is the module interface for different implementation of membership management. There are currently 3 implementations:
-
NOOP -
NoOpMembershipManager
: fallback to the old way of using master for worker registration is still leveraged for regression/testing purpose. -
STATIC -
StaticMembershipManager
: uses a static config file(default file is $ALLUXIO_HOME/conf/workers) to configure a list of workers hostnames to form the alluxio cluster, it doesn't provide membership capability as to track any new member joining / leaving, member liveliness. It's merely used as a simple quickstart deployment way to spin up a DORA alluxio cluster. -
ETCD -
EtcdMembershipManager
: uses a pre-configured standalone etcd cluster to manage worker membership. On first startup, worker will register itself to etcd, and then keeping its liveness to etcd throughout its process lifetime. Through EtcdMembershipManager module, either client or worker could get informations about:
a. What are the currently registered workers?
b. What are the currently alive workers?
No need to configure anything, it will not leverage any MembershipManager module at all.
Use a static file, following the format of conf/workers (refer to : https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-a-Cluster.html?q=conf%2Fworkers#basic-setup) , put hostnames of ALL workers on each new line. And configure the alluxio-site.properties with:
alluxio.worker.membership.manager.type=STATIC
alluxio.worker.static.config.file=<absolute_path_to_static_config_workerlist_file>
or just
alluxio.worker.membership.manager.type=STATIC
then conf/workers will be used. e.g. configure an alluxio cluster with 2 workers, conf/workers:
# List of Worker started on each of the machines listed below.
ec2-1-111-11-111.compute-1.amazonaws.com
ec2-2-222-22-222.compute-2.amazonaws.com
Depending on the deployment environment, Bare Metal or K8s, users could setup etcd cluster and alluxio cluster individually, or through helm install with alluxio's k8s operator for a one-click install for both.
Set up etcd cluster, refer to etcd doc here: https://etcd.io/docs/v3.4/op-guide/clustering/ For versions, we recommend using V3 etcd version as we don't support V2 versions. But we don't have a specific requirement of which V3 version as of now.
e.g. Say we have an etcd 3 node setup:
Name | Address | Hostname |
---|---|---|
infra0 | 10.0.1.10 | infra0.example.com |
infra1 | 10.0.1.11 | infra1.example.com |
infra2 | 10.0.1.12 | infra2.example.com |
Configure alluxio-site.properties:
alluxio.worker.membership.manager.type=ETCD
alluxio.etcd.endpoints=http://infra0.example.com:2379,http://infra1.example.com:2379,http://infra2.example.com:2379
[NOTICE] As etcdmembership module relies on etcd's high availability to provide membership service, include ALL the etcd cluster nodes in configuration (or at lease all initial ones if new nodes has been bootstrapped into etcd later) to allow etcdmembership module to redirect connection to etcd leader automatically.
After spin up alluxio workers, use bin/alluxio info nodes
to check status of worker registration.
WorkerId Address Status
6e715648b6f308cd8c90df531c76a028 127.0.0.1:29999 ONLINE
If your etcd cluster has authentication enabled, you need to create a user with granted role permission with full readwrite to all keys with prefix '/'. Official guidance from Etcd is provided here: https://etcd.io/docs/v3.2/op-guide/authentication/. But here is a simple setup guide to set up a user/role for alluxio Etcd membership module:
# Enable etcd authentication, we need to have user 'root' first.
# Check the user list of your etcd cluster by $etcdctl user list.
# Skip this step if you already have user 'root'.
$ etcdctl user add root (here using 'root' as password for prompt as example)
# Enable authentication with root user
$ etcdctl --user root:root auth enable
Authentication Enabled
# Create a role, grant permission on prefix '/'
$ etcdctl --user root:root role add alluxioreadwrite
$ etcdctl --user root:root role grant-permission alluxioreadwrite --prefix=true readwrite /
# Create a user for alluxio, enter password on prompt.
$ etcdctl --user root:root user add alluxio
# Grant the user with the role.
$ etcdctl --user root:root user grant-role alluxio alluxioreadwrite
# Check if with the newly created user 'alluxio' we can access prefix '/' keys
$ etcdctl --user alluxio:<password> get --prefix /
Set user/password in the alluxio-site.properties:
alluxio.etcd.username=alluxio
alluxio.etcd.password=<password_for_alluxio>
Use k8s operator, we can spin up a DORA alluxio cluster along with etcd cluster pod(s) with helm. (Prerequisite refer to https://docs.google.com/document/d/1iiDZDNBTJWQ1WAJ-31aKDo9pL1DeTrvrvYUdd-YrTpI/edit#heading=h.1rc792noj716)
To pull etcd dependency for helm repo, do
helm dependency update
To configure alluxio with a single pod etcd cluster: enable etcd component in k8s-operator/deploy/charts/alluxio/config.yaml
image: <docker_username>/<image-name>
imageTag: <tag>
dataset:
path: <ufs path>
credentials: # s3 as example. Leave it empty if not needed.
aws.accessKeyId:xxxxxxxxxx
aws.secretKey: xxxxxxxxxxxxxxx
etcd:
enabled: true
then under k8s-operator/deploy/charts/alluxio/
do:
$helm install <cluster name> -f config.yaml .
then with $kubectl get pods
will give:
[root@ip-172-31-24-66 alluxio]# kubectl get pods
NAME READY STATUS RESTARTS AGE
dora0802-alluxio-master-0 0/1 Running 0 3s
dora0802-alluxio-worker-6577bc9-s6njq 0/1 Running 0 3s
dora0802-etcd-0 0/1 Running 0 3s
- To spin up 3-node etcd cluster
Simply add replicaCount
field to indicate number of etcd instances:
etcd:
enabled: true
replicaCount: 3
will now have a 3-pod etcd cluster:
NAME READY STATUS RESTARTS AGE
dora0802-1-alluxio-master-0 1/1 Running 0 111m
dora0802-1-alluxio-worker-5fc8bd885-jk6pn 1/1 Running 0 111m
dora0802-1-etcd-0 1/1 Running 0 111m
dora0802-1-etcd-1 1/1 Running 0 111m
dora0802-1-etcd-2 1/1 Running 0 111m
If you would like to use etcdctl in k8s env, spin up a etcdclient via:
$kubectl run lucyetcd-client --restart='Never' --image docker.io/bitnami/etcd:3.5.9-debian-11-r24 --env ETCDCTL_ENDPOINTS="dora0802-1-etcd:2379" --namespace default --command -- sleep infinity
For detailed introduction on how the Registration/ServiceDiscovery is done with Etcd, check this doc: https://github.com/Alluxio/alluxio/wiki/Etcd-backed-membership
Use ETCD for worker registration, but only record active worker (instead of recording permanent workers). Meaning we only have KV under /ServiceDiscovery
.
Configure alluxio-site.properties:
alluxio.worker.membership.manager.type=SERVICE_REGISTRY
alluxio.etcd.endpoints=http://infra0.example.com:2379,http://infra1.example.com:2379,http://infra2.example.com:2379