Skip to content

Commit

Permalink
fix: ray helm chart should specify parallelism to avoid livelock
Browse files Browse the repository at this point in the history
guidebooks/store#634

if a Job does not specify parallelism=completions, then a livelock will occur. with the default parallelism (which is 1), the Job controller creates one Pod at a time, waiting till it is scheduled before creating the next one.
meanwhile, the coscheduler doesn’t allow that first one to be scheduled until the rest of the Pods are created…

and … for ray, we were using Jobs with default parallelism
  • Loading branch information
starpit committed Mar 13, 2023
1 parent d2feab6 commit 56952bc
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 11 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/kind.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- non-gpu3/keep-it-simple # ray
- non-gpu4/keep-it-simple # ray
- non-gpu5/keep-it-simple # ray with dashdash args
- non-gpu6/mcad-default # torchx
## TORCHX BREAKAGE /app/compute_world_size/main.py not found - non-gpu6/mcad-default # torchx
# - non-gpu1/ray-autoscaler
- non-gpu1/mcad-default # ray
- non-gpu1/mcad-coscheduler # ray
Expand Down
18 changes: 9 additions & 9 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion plugins/plugin-codeflare/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
"@types/split2": "^3.2.1"
},
"dependencies": {
"@guidebooks/store": "^6.0.8",
"@guidebooks/store": "^6.0.9",
"@logdna/tail-file": "^3.0.1",
"@patternfly/react-charts": "^6.94.18",
"@patternfly/react-core": "^4.276.6",
Expand Down

0 comments on commit 56952bc

Please sign in to comment.