graphd on different zone not even #425

jinyingsunny · 2024-01-25T03:40:48Z

as title
before expend cluster config:[3metad-3storaged-9graphd], then expend graphd from 9->28

operator上的扩容日志：

I0125 02:58:59.593453       1 workload.go:122] workload StatefulSet nebula/nebula2-graphd updated successfully
E0125 02:58:59.597458       1 nebula_cluster_control.go:171] reconcile console failed: waiting for graphd cluster [nebula/nebula2-graphd] ready
I0125 02:58:59.614309       1 nebulacluster.go:129] NebulaCluster [nebula/nebula2] status updated successfully
I0125 02:58:59.614331       1 nebula_cluster_controller.go:184] NebulaCluster [nebula/nebula2] reconcile details: waiting for graphd cluster [nebula/nebula2-graphd] ready
I0125 02:58:59.614336       1 nebula_cluster_controller.go:184] NebulaCluster [nebula/nebula2] reconcile details: waiting for nebulacluster ready
I0125 02:58:59.614340       1 nebula_cluster_controller.go:157] Finished reconciling NebulaCluster [nebula/nebula2] (149.240646ms), result: {false 10s}
I0125 02:58:59.614438       1 nebula_cluster_controller.go:174] Start to reconcile NebulaCluster
I0125 02:58:59.806120       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-9] scheduled on node sunny in zone us-east-2b
I0125 02:58:59.809390       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-10] scheduled on node liuxue in zone us-east-2a
I0125 02:58:59.812622       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-11] scheduled on node k8s-node2 in zone us-east-2c
I0125 02:58:59.815446       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-12] scheduled on node k8s-node1 in zone us-east-2b
I0125 02:58:59.818606       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-13] scheduled on node liuxue in zone us-east-2a
I0125 02:58:59.821397       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-14] scheduled on node k8s-node2 in zone us-east-2c
I0125 02:58:59.824473       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-15] scheduled on node sunny in zone us-east-2b
I0125 02:58:59.827788       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-16] scheduled on node liuxue in zone us-east-2a
I0125 02:58:59.831215       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-17] scheduled on node k8s-node2 in zone us-east-2c
I0125 02:58:59.834867       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-18] scheduled on node k8s-node1 in zone us-east-2b
I0125 02:58:59.837658       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-19] scheduled on node k8s-master in zone us-east-2a
I0125 02:58:59.840796       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-20] scheduled on node sunny in zone us-east-2b
I0125 02:58:59.843901       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-21] scheduled on node liuxue in zone us-east-2a
I0125 02:58:59.846766       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-22] scheduled on node k8s-node2 in zone us-east-2c
I0125 02:58:59.851170       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-23] scheduled on node k8s-node1 in zone us-east-2b
I0125 02:58:59.854348       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-24] scheduled on node k8s-node2 in zone us-east-2c
I0125 02:58:59.866979       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-25] scheduled on node k8s-master in zone us-east-2a
I0125 02:58:59.915542       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-26] scheduled on node sunny in zone us-east-2b
I0125 02:58:59.979432       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-27] scheduled on node k8s-master in zone us-east-2a
I0125 02:59:00.045983       1 cm.go:98] configMap [nebula/nebula2-graphd-zone] updated successfully

扩容后集群状态

# kubectl -n nebula get pod
NAME                                READY   STATUS    RESTARTS   AGE
nebula2-console                     1/1     Running   0          19h
nebula2-exporter-5d5d6f5455-7r842   1/1     Running   0          20h
nebula2-graphd-0                    1/1     Running   0          14h
nebula2-graphd-1                    1/1     Running   0          14h
nebula2-graphd-10                   1/1     Running   0          25m
nebula2-graphd-11                   1/1     Running   0          25m
nebula2-graphd-12                   1/1     Running   0          25m
nebula2-graphd-13                   1/1     Running   0          25m
nebula2-graphd-14                   1/1     Running   0          25m
nebula2-graphd-15                   1/1     Running   0          25m
nebula2-graphd-16                   1/1     Running   0          25m
nebula2-graphd-17                   1/1     Running   0          25m
nebula2-graphd-18                   1/1     Running   0          25m
nebula2-graphd-19                   1/1     Running   0          25m
nebula2-graphd-2                    1/1     Running   0          14h
nebula2-graphd-20                   1/1     Running   0          25m
nebula2-graphd-21                   1/1     Running   0          25m
nebula2-graphd-22                   1/1     Running   0          25m
nebula2-graphd-23                   1/1     Running   0          25m
nebula2-graphd-24                   1/1     Running   0          25m
nebula2-graphd-25                   1/1     Running   0          25m
nebula2-graphd-26                   1/1     Running   0          25m
nebula2-graphd-27                   1/1     Running   0          25m
nebula2-graphd-3                    1/1     Running   0          14h
nebula2-graphd-4                    1/1     Running   0          14h
nebula2-graphd-5                    1/1     Running   0          14h
nebula2-graphd-6                    1/1     Running   0          14h
nebula2-graphd-7                    1/1     Running   0          14h
nebula2-graphd-8                    1/1     Running   0          14h
nebula2-graphd-9                    1/1     Running   0          25m
nebula2-metad-0                     1/1     Running   0          14h
nebula2-metad-1                     1/1     Running   0          14h
nebula2-metad-2                     1/1     Running   0          14h
nebula2-storaged-0                  1/1     Running   0          14h
nebula2-storaged-1                  1/1     Running   0          14h
nebula2-storaged-2                  1/1     Running   0          14h

我的yaml文件：

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebula
  namespace: nebula
spec:
  console:
    image: vesoft/nebula-console
    version: v3.6.0
  agent:
    image: reg.vesoft-inc.com/cloud-dev/nebula-agent
    resources: {}
    version: latest
  enablePVReclaim: true
  exporter:
    httpPort: 9100
    image: vesoft/nebula-stats-exporter
    maxRequests: 20
    replicas: 1
    version: latest
  failoverPeriod: 5m0s
  graphd:
    config:
      stderrthreshold: "0"
    image: reg.vesoft-inc.com/rc/nebula-graphd-ent
    replicas: 9
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 200m
        memory: 500Mi
    version: v3.5-snap-ent
  imagePullPolicy: Always
  imagePullSecrets:
  - name: image-pull-secret
  metad:
    config:
      stderrthreshold: "1"
      zone_list: us-east-2a,us-east-2b,us-east-2c
      timestamp_in_logfile_name: "false"
      #validate_session_timestamp: "false"
      v: "3"
      license_manager_url: nebula-license-manager.nebula-license-manager.svc.cluster.local:9119
    dataVolumeClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: local-path
    image: reg.vesoft-inc.com/rc/nebula-metad-ent
    replicas: 1
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 500Mi
    version: v3.5-snap-ent
  reference:
    name: statefulsets.apps
    version: v1
  schedulerName: default-scheduler
  sslCerts:
    caCert: root.crt
    caSecret: ca-cert
    clientCACert: ca.crt
    clientCert: tls.crt
    clientKey: tls.key
    clientSecret: client-cert
    insecureSkipVerify: true
    serverCert: tls.crt
    serverKey: tls.key
    serverSecret: server-cert
  storaged:
    config:
      stderrthreshold: "2"
    dataVolumeClaims:
    - resources:
        requests:
          storage: 2Gi
      storageClassName: local-path
    enableAutoBalance: true
    image: reg.vesoft-inc.com/vesoft-ent/nebula-storaged-ent
    #image: reg.vesoft-inc.com/rc/nebula-storaged-ent
    replicas: 3
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 500Mi
    version: v3.5-snap-ent
  topologySpreadConstraints:
  - topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
  imagePullSecrets:
  - name: image-nebula-ent-sc-secret
  nodeSelector:
    nebula: cloud

5 个 nodes分别打了3个不同zone的label，可以搜索topology.kubernetes.io/zone=us-east-2

# kubectl get nodes --show-labels
NAME         STATUS   ROLES           AGE    VERSION   LABELS
k8s-master   Ready    control-plane   193d   v1.27.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,kubernetes.io/zone=us-east-2a,nebula=cloud,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=,topology.kubernetes.io/zone=us-east-2a
k8s-node1    Ready    <none>          191d   v1.27.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux,kubernetes.io/zone=us-east-2b,nebula=cloud,topology.kubernetes.io/zone=us-east-2b
k8s-node2    Ready    <none>          193d   v1.27.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node2,kubernetes.io/os=linux,kubernetes.io/zone=us-east-2c,nebula=cloud,topology.kubernetes.io/zone=us-east-2c
liuxue       Ready    <none>          69d    v1.27.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=liuxue,kubernetes.io/os=linux,kubernetes.io/zone=us-east-2a,nebula=cloud,topology.kubernetes.io/zone=us-east-2a
sunny        Ready    <none>          68d    v1.27.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=sunny,kubernetes.io/os=linux,kubernetes.io/zone=us-east-2b,nebula=cloud,topology.kubernetes.io/zone=us-east-2b

Your Environments (required)

operator:snap-1.30
kubectl version: v1.27.3

Expected behavior

after expend, graphd spread even in different zone

The text was updated successfully, but these errors were encountered:

jinyingsunny · 2024-01-25T05:41:06Z

retry for 3time, not even occur 3times. graphd expand :
9->28 , 9->29, 9->21 . from graphd18->graphd20, it always separated on us-east-2b,us-east-2a,us-east-2b

I0125 05:37:31.977733       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-18] scheduled on node k8s-node1 in zone us-east-2b
I0125 05:37:31.980248       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-19] scheduled on node k8s-master in zone us-east-2a
I0125 05:37:32.041398       1 graphd_cluster.go:310] graphd pod [nebula/nebula2-graphd-20] scheduled on node sunny in zone us-east-2b

jinyingsunny · 2024-02-19T10:56:39Z

operator:snap-1.35没复现。
前面验证发现此问题，是因为使用的nebula集群配置不对，当拓扑结构不满足时，应该是DoNotSchedule.而不应该是下面的直接调度。另外开发同学反馈代码也有bug，当前已修复。

  topologySpreadConstraints:
  - topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway

另外,扩容后资源不足，补充资源后，继续完成扩容；这个场景可以通过缩容其他服务来实现。

jinyingsunny added severity/major Severity of bug type/bug Type: something is unexpected affects/master PR/issue: this bug affects master version. labels Jan 25, 2024

jinyingsunny assigned MegaByte875 Jan 25, 2024

wey-gu mentioned this issue Jan 27, 2024

Weekly Report 2024-01-26 vesoft-inc/nebula-community#424

Closed

jinyingsunny closed this as completed Feb 19, 2024

github-actions bot added the process/fixed Process of bug label Feb 19, 2024

jinyingsunny added process/done Process of bug and removed process/fixed Process of bug labels Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graphd on different zone not even #425

graphd on different zone not even #425

jinyingsunny commented Jan 25, 2024 •

edited

Loading

jinyingsunny commented Jan 25, 2024

jinyingsunny commented Feb 19, 2024 •

edited

Loading

graphd on different zone not even #425

graphd on different zone not even #425

Comments

jinyingsunny commented Jan 25, 2024 • edited Loading

jinyingsunny commented Jan 25, 2024

jinyingsunny commented Feb 19, 2024 • edited Loading

jinyingsunny commented Jan 25, 2024 •

edited

Loading

jinyingsunny commented Feb 19, 2024 •

edited

Loading