[FEATURES] Add the `PodAffinity` to the `Dataset` CRD #3496

dashanji · 2023-10-19T03:05:21Z

What feature you'd like to add:

Add the PodAffinity to the Dataset Spec.

type DatasetSpec struct {
         ....
         PodAffinity corev1.PodAffinity
         ....
}

Why is this feature needed:

When the dataset is not cached by Node, but cached by Pod, the PodAffinity field is more appropriate.

BTW, I can submit a PR to address it if the feature is approved.

The text was updated successfully, but these errors were encountered:

TrafalgarZZZ · 2023-10-19T08:13:13Z

@dashanji Thanks for the feature request ! I think that's a great feature, but I am not sure what did u mean "not cached by Node, but cached by Pod"? Do u mind explaining it more detailedly? An example is appreciated to make sure we're on the same page to the feature.

dashanji · 2023-10-20T12:47:35Z

Hi @TrafalgarZZZ, actually I'm doing some integrations and want to use Fluid as the launcher of our storage system Vineyard.

We have a storage engine Vineyard, which uses socket as the access interface. Therefore, all applications using Vineyard must be on the same node as they must connect the same socket. Inspired by this example, we can leverage fluid's Dataset and ThinRuntime to implement socket mounting. The main steps are as follows.

Here I used kind to create a k8s cluster. Assume there is 1 master node and 3 worker nodes.

Deploy Fluid.
Deploy a Vineyard Deployment containing only one replica.
Create the configure file configure-vineyard-socket.py.

import json

with open("/etc/fluid/config.json", "r") as f:
    lines = f.readlines()

rawStr = lines[0]
print(rawStr)


script = """
#!/bin/sh
set -ex

mkdir -p $targetPath
while true; do
    if [ ! -S "$targetPath/vineyard.sock" ]; then
        mount --bind $socketPath $targetPath
    fi
    sleep 10
done
"""

obj = json.loads(rawStr)

with open("mount-vineyard-socket.sh", "w") as f:
    f.write("targetPath=\"%s\"\n" % obj['targetPath'])
    if obj['mounts'][0]['mountPoint'].startswith("local://"):
      f.write("socketPath=\"%s\"\n" % obj['mounts'][0]['mountPoint'][len("local://"):])
    else:
      f.write("socketPath=\"%s\"\n" % obj['mounts'][0]['mountPoint'])

    f.write(script)

Create the following Profile.

apiVersion: data.fluid.io/v1alpha1
kind: ThinRuntimeProfile
metadata:
  name: vineyard-profile
spec:
  fileSystemType: fuse
  volumes:
  - name: vineyard-socket
    hostPath:
      # This path should be the same as the vineyard socket path in the vineyard deployment
      path: /var/run/vineyard-kubernetes/vineyard-system/vineyardd-sample
      type: DirectoryOrCreate
  fuse:
    image: configure-vineyard-socket
    imageTag: latest
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: vineyard-socket
      mountPath: /var/run/vineyard-kubernetes/vineyard-system/vineyardd-sample
    command:
    - sh
    - -c
    - "python3 /configure-vineyard-socket.py && chmod u+x ./mount-vineyard-socket.sh && ./mount-vineyard-socket.sh"

Create the dataset with PodAffinity, so that the generated vineyard-fuse-pod can be bound to the same node as the previously deployed vineyard Pod.

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: vineyard
spec:
  mounts:
  # This directory should be the same as the vineyard socket directory in the vineyard deployment
  - mountPoint: local:///var/run/vineyard-kubernetes/vineyard-system/vineyardd-sample
    name: vineyard
  accessModes:
  - ReadWriteMany
  ################## Added #################
  podAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        ###### label_that_match_vineyard_pod ######
        - key: app.kubernetes.io/instance
          operator: In
          values:
          - vineyard-system-vineyardd-sample
       topologyKey: kubernetes.io/hostname
  #########################################
---
apiVersion: data.fluid.io/v1alpha1
kind: ThinRuntime
metadata:
  name: vineyard
spec:
  profileName: vineyard-profile

I think that's a great feature, but I am not sure what did u mean "not cached by Node, but cached by Pod"?

In the above example, the dataset is strongly dependent on Pod (Vineyard) rather than Node.

However, in practice, I found a problem. Using the following nodeAffinity does not allow the fuse pod to be scheduled to the specified node (kind-worker3).

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: vineyard
spec:
  mounts:
  # This directory should be the same as the vineyard socket directory in the vineyard deployment
  - mountPoint: local:///var/run/vineyard-kubernetes/vineyard-system/vineyardd-sample
    options:
      vineyard-socket-directory: /var/run/vineyard-kubernetes/vineyard-system/vineyardd-sample
    name: vineyard
  accessModes:
  - ReadWriteMany
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
            - kind-worker3
---
apiVersion: data.fluid.io/v1alpha1
kind: ThinRuntime
metadata:
  name: vineyard
spec:
  profileName: vineyard-profile

apiVersion: v1
kind: Pod
metadata:
  name: vineyard-test
  labels:
    fuse.serverful.fluid.io/inject: "true"
    fluid.io/dataset.vineyard.sched: required
spec:
  containers:
    - name: nginx
      image: nginx
      volumeMounts:
        - mountPath: /data
          name: vineyard-data
  volumes:
    - name: vineyard-data
      persistentVolumeClaim:
        claimName: vineyard

Is there something I missed? Looking forward to your reply. Thanks.

dashanji · 2024-02-05T07:15:08Z

Close via #3528

dashanji added the features features label Oct 19, 2023

dashanji closed this as completed Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURES] Add the `PodAffinity` to the `Dataset` CRD #3496

[FEATURES] Add the `PodAffinity` to the `Dataset` CRD #3496

dashanji commented Oct 19, 2023

TrafalgarZZZ commented Oct 19, 2023

dashanji commented Oct 20, 2023 •

edited

Loading

dashanji commented Feb 5, 2024

[FEATURES] Add the PodAffinity to the Dataset CRD #3496

[FEATURES] Add the PodAffinity to the Dataset CRD #3496

Comments

dashanji commented Oct 19, 2023

TrafalgarZZZ commented Oct 19, 2023

dashanji commented Oct 20, 2023 • edited Loading

dashanji commented Feb 5, 2024

[FEATURES] Add the `PodAffinity` to the `Dataset` CRD #3496

[FEATURES] Add the `PodAffinity` to the `Dataset` CRD #3496

dashanji commented Oct 20, 2023 •

edited

Loading