feat: update helm chart for k8 RP [DET-3542] #882

aaron276h · 2020-07-14T18:23:10Z

Description

Added several missing configurations, made the k8 RP configurable, and added CPU and MEM request limits for the master.

Test Plan

Tested manually by deploying on a k8 cluster. Automated testing will be added as part of M4.

sidneyw

Just a few questions

sidneyw · 2020-07-20T20:35:37Z

helm/charts/determined/templates/master-config.yaml

+      {{- end }}
+    {{ end }}
+
+    {{- if .Values.telemetry }}


non-blocking: is there no way to do an and comparison in one if statement? It's not a big deal but might look cleaner

doesn't seem like Helm has a and flow control operator unfortunately

sidneyw · 2020-07-20T20:40:32Z

helm/charts/determined/values.yaml

+# scheduling multi-GPU tasks. Multi-gpu (distributed training) tasks will be scheduled as
+# slotsPerTask / slotsPerNode separate pods (tasks sizes that are not divisible by slotsPerNode
+# are never scheduled.
+slotsPerNode:


question: what happens if the cluster has different instance types? Is there another way we are making sure each node has the same number of GPUs?

Right now we don't do anything to account for it. One thing users could do is set it to the smallest denominator, then we will just have the overhead of running multiple pods per node. I am planning to document this as part of M5, also will expand the doc-string here with this info.

sidneyw · 2020-07-20T20:41:01Z

helm/charts/determined/values.yaml

+masterCpuRequest: "4"
+masterMemRequest: "8Gi"
+
+## Configure the task container defaults. Tasks include trials, commands, tensorboards and more.


non-blocking: more is just notebooks right?

There is also shells, will just write them all out.

sidneyw · 2020-07-20T20:51:30Z

helm/charts/determined/values.yaml

+## random non-privileged ports, respectively.
+taskContainerDefaults:
+  shmSizeBytes: 4294967296
+  # networkMode: bridge


question: is bridge the only valid value for networkMode?

There is also "host" networking mode, but obv that isn't advisable in k8s.

aaron276h added 3 commits July 14, 2020 14:18

feat: update helm chart to support k8 RP

83b3d34

chore: add necessary permissions for k8 deployments

5b9d156

feat: set CPU and MEM reqs for k8 master deployment

387e567

aaron276h requested a review from sidneyw July 14, 2020 18:23

aaron276h assigned sidneyw Jul 14, 2020

cla-bot bot added the cla-signed label Jul 14, 2020

sidneyw approved these changes Jul 20, 2020

View reviewed changes

sidneyw assigned aaron276h and unassigned sidneyw Jul 20, 2020

chore: addressing review feeback

6d1a989

aaron276h assigned sidneyw and unassigned aaron276h Jul 20, 2020

sidneyw assigned aaron276h and unassigned sidneyw Jul 20, 2020

aaron276h merged commit 56df7d2 into determined-ai:master Jul 20, 2020

eecsliu pushed a commit to eecsliu/determined that referenced this pull request Jun 23, 2023

chore: set default launcher port and protocol (determined-ai#882)

9075576

eecsliu pushed a commit that referenced this pull request Jun 28, 2023

chore: set default launcher port and protocol (#882)

5f652e8

eecsliu pushed a commit that referenced this pull request Jun 28, 2023

chore: set default launcher port and protocol (#882)

c9db7bf

stoksc pushed a commit that referenced this pull request Jul 20, 2023

chore: set default launcher port and protocol (#882)

8024ca2

eecsliu pushed a commit that referenced this pull request Jul 24, 2023

chore: set default launcher port and protocol (#882)

9a2778c

stoksc pushed a commit that referenced this pull request Oct 17, 2023

chore: set default launcher port and protocol (#882)

e5041ba

azhou-determined pushed a commit that referenced this pull request Dec 7, 2023

chore: set default launcher port and protocol (#882)

5f806d4

wes-turner pushed a commit that referenced this pull request Feb 2, 2024

chore: set default launcher port and protocol (#882)

2d5fb4e

dannysauer added this to the 0.12.12 milestone Feb 6, 2024

rb-determined-ai pushed a commit that referenced this pull request Feb 29, 2024

chore: set default launcher port and protocol (#882)

e55ff71

amandavialva01 pushed a commit that referenced this pull request Mar 18, 2024

chore: set default launcher port and protocol (#882)

fc52c87

eecsliu pushed a commit that referenced this pull request Apr 18, 2024

chore: set default launcher port and protocol (#882)

32eb48e

eecsliu pushed a commit to determined-ai/determined-release-testing that referenced this pull request Apr 22, 2024

chore: set default launcher port and protocol (determined-ai#882)

a236dba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: update helm chart for k8 RP [DET-3542] #882

feat: update helm chart for k8 RP [DET-3542] #882

aaron276h commented Jul 14, 2020

sidneyw left a comment

sidneyw Jul 20, 2020

aaron276h Jul 20, 2020

sidneyw Jul 20, 2020

aaron276h Jul 20, 2020

sidneyw Jul 20, 2020

aaron276h Jul 20, 2020

sidneyw Jul 20, 2020

aaron276h Jul 20, 2020

feat: update helm chart for k8 RP [DET-3542] #882

feat: update helm chart for k8 RP [DET-3542] #882

Conversation

aaron276h commented Jul 14, 2020

Description

Test Plan

sidneyw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment