-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add process subcommands for multiple nodes #17887
Conversation
Automated checks report:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks. |
46e0987
to
4daa26d
Compare
Automated checks report:
All checks passed! |
subProcess := subProcess | ||
err := subProcess.Start(cmd) | ||
if err != nil { | ||
log.Logger.Errorf("Error: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the current behavior, if a group of processes fails to start (ex. one or more masters do not successfully start), does it continue to try to start the remaining processes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this behavior, if one behavior fail, it will continue to start the remaining processes, but give error messages. This is the same to the alluxio-start.sh
version. If it needs immediate termination, I can change Errorf
to Fatalf
, which leads to immediate termination after single process failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the hard part to get it to work functionally is done, but there is still clean up work to make sure the code to scalable and extensible for others to understand and improve upon.
if err != nil { | ||
log.Logger.Errorf("Error reading worker hostnames at %s", mastersDir) | ||
return nil, err | ||
var cliPath = filepath.Join(env.Env.EnvVar.GetString(env.ConfAlluxioHome.EnvVar), "bin", "alluxio") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cliPath
is dependent on env.ConfAlluxioHome.EnvVar
to be properly set, but the environment variables are set in the beginning of the command via
rootCmd.PersistentPreRunE = func(cmd *cobra.Command, args []string) error {
if flagDebugLog {
log.Logger.SetLevel(logrus.DebugLevel)
}
if err := env.InitAlluxioEnv(filepath.Clean(flagRootPath), flagIsDeployed); err != nil {
return stacktrace.Propagate(err, "error defining alluxio environment")
}
return nil
}
in launch.go
so this will currently return an empty string, but it will happen to work if you are running the cli from the home directory. so you should store bin/alluxio
as a constant string, but each time you want to get the cli path, you'll have to define filepath.Join(env.Env.EnvVar.GetString(env.ConfAlluxioHome.EnvVar), binAlluxioPath)
// get list of masters and workers | ||
var nodes []string | ||
if onMasters { | ||
if len(masterList) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another way is to move the hostname handling outside the runCommand
:
func (p *FooProcess) Start(cmd *env.StartProcessCommand) error {
command := addStartFlags("process start foo", cmd)
// For master-only situations
nodes := getNodes(filepath.Join(env.Env.EnvVar.GetString(env.ConfAlluxioConfDir.EnvVar), "masters"))
// For masters & workers
nodes = append(nodes, getNodes(filepath.Join(env.Env.EnvVar.GetString(env.ConfAlluxioConfDir.EnvVar), "workers"))...)
runCommand(command, nodes)
}
in this case, it is no need to place the file parsing in a multi-threading function since it's not very time-consuming. so we can parse the file, then call the function with the whole node list.
avoid using the bool as a switch of two different functions and use the actual different parameters directly.
|
||
if len(errors) != 0 { | ||
log.Logger.Warningf("Run command %s failed, number of failures: %v", command, len(errors)) | ||
return stacktrace.Propagate(errors[0], "First error: ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure if we need to log every error here or dump it to some file for debugging. but i'm agree with the current structure: handling the error list, and return a digest error for the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks a lot better and easier to understand when compared to the initial review. a few more minor comments but otherwise it looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for all the new strings.Join()
, join with space " "
instead of empty string ""
?
alluxio-bot, merge this please |
merge failed: |
alluxio-bot, merge this please |
Add `process` commands with multiple nodes to golang CLI as part of #17522 `bin/alluxio-start.sh masters` -> `bin/cli.sh process start masters` `bin/alluxio-start.sh job_masters` -> `bin/cli.sh process start job_masters` `bin/alluxio-start.sh workers` -> `bin/cli.sh process start workers` `bin/alluxio-start.sh job_workers` -> `bin/cli.sh process start job_workers` `bin/alluxio-start.sh proxies` -> `bin/cli.sh process start proxies` `bin/alluxio-start.sh all` -> `bin/cli.sh process start all` `bin/alluxio-start.sh local` -> `bin/cli.sh process start local` `bin/alluxio-stop.sh masters` -> `bin/cli.sh process stop masters` `bin/alluxio-stop.sh job_masters` -> `bin/cli.sh process stop job_masters` `bin/alluxio-stop.sh workers` -> `bin/cli.sh process stop workers` `bin/alluxio-stop.sh job_workers` -> `bin/cli.sh process stop job_workers` `bin/alluxio-stop.sh proxies` -> `bin/cli.sh process stop proxies` `bin/alluxio-stop.sh all` -> `bin/cli.sh process stop all` `bin/alluxio-stop.sh local` -> `bin/cli.sh process stop local` For command `process start/stop` on multiple nodes, using crypto's `ssh` package to create an SSH session, connect to masters, workers or all nodes, then send according subcommand on single nodes to these nodes. pr-link: #17887 change-id: cid-0a660de35945738c065dae2f008a61d48f6b3be9
Add `process` commands with multiple nodes to golang CLI as part of Alluxio#17522 `bin/alluxio-start.sh masters` -> `bin/cli.sh process start masters` `bin/alluxio-start.sh job_masters` -> `bin/cli.sh process start job_masters` `bin/alluxio-start.sh workers` -> `bin/cli.sh process start workers` `bin/alluxio-start.sh job_workers` -> `bin/cli.sh process start job_workers` `bin/alluxio-start.sh proxies` -> `bin/cli.sh process start proxies` `bin/alluxio-start.sh all` -> `bin/cli.sh process start all` `bin/alluxio-start.sh local` -> `bin/cli.sh process start local` `bin/alluxio-stop.sh masters` -> `bin/cli.sh process stop masters` `bin/alluxio-stop.sh job_masters` -> `bin/cli.sh process stop job_masters` `bin/alluxio-stop.sh workers` -> `bin/cli.sh process stop workers` `bin/alluxio-stop.sh job_workers` -> `bin/cli.sh process stop job_workers` `bin/alluxio-stop.sh proxies` -> `bin/cli.sh process stop proxies` `bin/alluxio-stop.sh all` -> `bin/cli.sh process stop all` `bin/alluxio-stop.sh local` -> `bin/cli.sh process stop local` For command `process start/stop` on multiple nodes, using crypto's `ssh` package to create an SSH session, connect to masters, workers or all nodes, then send according subcommand on single nodes to these nodes. pr-link: Alluxio#17887 change-id: cid-0a660de35945738c065dae2f008a61d48f6b3be9
- Add cli/ folder containing golang code - Add script to compile code to executable in build/cli/build-cli.sh - Add entrypoint script for development in bin/cli.sh - Add build profile to build the CLI as part of tha maven build via `-PgoCli` - Add `conf` service, consisting of - `bin/cli.sh conf get` as an example command, equivalent to `bin/alluxio getConf` - `bin/cli.sh conf log --name fully.qualified.class.path`, equivalent to `bin/alluxio fsadmin logLevel --logName fully.qualified.class.path` See #17522 for more background info pr-link: #17532 dev docs for defining conventions for the new alluxio cli in golang pr-link: #17530 - Add commands to start/stop individual processes, ex. `bin/cli.sh process start master` - Define process interface and registry - process specific java opts are defined with the process and are dynamically added to env - Add journal format command as part of starting master process - Mount options for worker are not ported because they will be deprecated in the near future pr-link: #17561 Adds `info` subcommand, which contains: - `cache`: worker capacity information, calls the existing `bin/alluxio fsadmin report capacity` command - `collect`: collects cluster information into a single tarball, calls the existing `bin/alluxio collectInfo` command - `master`: master quorum information, calls the existing `bin/alluxio fs masterInfo` command - `report`: cluster information, calls the existing `bin/alluxio fsadmin report` command, excluding the `capacity` category pr-link: #17566 Add journal commands to CLI as part of #17522 pr-link: #17569 Add quorum/HA related commands as part of #17522 pr-link: #17570 Add fs commands to golang CLI as part of #17522 unlike other CLI commands, utilize arguments to maintain the existing structure of filesystem commands (ex. cp, du, ls, mv, rm, etc) pr-link: #17580 Add generate commands to golang CLI as part of #17522 bin/alluxio docGen -> bin/cli generate docs bin/alluxio bootstrapConf -> bin/cli generate template to read ALLUXIO_CONF_DIR, also conducted a minor change on env variables. pr-link: #17790 Add exec commands to golang CLI as part of #17522 `bin/alluxio runClass` -> `bin/cli exec class` `bin/alluxio runTests` -> `bin/cli exec testRun` `bin/alluxio runUfsTests` -> `bin/cli exec testUfs` `bin/alluxio runUfsIOTest` -> `bin/cli exec testUfsIO` `bin/alluxio runHdfsMountTests` -> `bin/cli exec testHdfsMount` `bin/alluxio runHmsTests` -> `bin/cli exec testHms` `bin/alluxio runJournalCrashTest` -> `bin/cli exec testJournalCrash` For command `exec test`, using different `--name` flags can lead to huge difference in other flags and options. To manage flags and options, currently using different commands for different test types. pr-link: #17797 Prepare golang cli for tarball build and replace bin/alluxio allow java 8 or 11 add cli binaries to other tarball profiles Add `process` commands with multiple nodes to golang CLI as part of #17522 `bin/alluxio-start.sh masters` -> `bin/cli.sh process start masters` `bin/alluxio-start.sh job_masters` -> `bin/cli.sh process start job_masters` `bin/alluxio-start.sh workers` -> `bin/cli.sh process start workers` `bin/alluxio-start.sh job_workers` -> `bin/cli.sh process start job_workers` `bin/alluxio-start.sh proxies` -> `bin/cli.sh process start proxies` `bin/alluxio-start.sh all` -> `bin/cli.sh process start all` `bin/alluxio-start.sh local` -> `bin/cli.sh process start local` `bin/alluxio-stop.sh masters` -> `bin/cli.sh process stop masters` `bin/alluxio-stop.sh job_masters` -> `bin/cli.sh process stop job_masters` `bin/alluxio-stop.sh workers` -> `bin/cli.sh process stop workers` `bin/alluxio-stop.sh job_workers` -> `bin/cli.sh process stop job_workers` `bin/alluxio-stop.sh proxies` -> `bin/cli.sh process stop proxies` `bin/alluxio-stop.sh all` -> `bin/cli.sh process stop all` `bin/alluxio-stop.sh local` -> `bin/cli.sh process stop local` For command `process start/stop` on multiple nodes, using crypto's `ssh` package to create an SSH session, connect to masters, workers or all nodes, then send according subcommand on single nodes to these nodes. pr-link: #17887 Add job commands to golang CLI as part of #17522 `bin/alluxio-bash job cancel id` -> `bin/alluxio job cancel --id` `bin/alluxio-bash job leader` -> `bin/alluxio job leader` `bin/alluxio-bash job ls` -> `bin/alluxio job list` `bin/alluxio-bash job getCmdStatus jobControlId` -> `bin/alluxio job cmdStatus --id` `bin/alluxio-bash job stat [-v] id` -> `bin/alluxio job jobStatus [-v] --id` `bin/alluxio-bash fs distributedCp src dst` -> `bin/alluxio job submit --type cp --src --dst` `bin/alluxio-bash fs distributedMv src dst` -> `bin/alluxio job submit --type mv --src --dst` `bin/alluxio-bash fs load path --submit` -> `bin/alluxio job load --path` pr-link: #17931 Clean up multiprocess code - refactor commands launching the same process on multiple hosts into a single struct - simplify the logic to ssh into different hosts with more straightforward handling of error channel - move names.go under its own package to avoid import cycles pr-link: #17947 Add an `info version` subcommand to golang CLI as part of #17522 `bin/alluxio-bash version` -> `bin/alluxio info version` pr-link: #17943 update calls to old bash scripts to refer to alluxio-bash Add `cache` commands to golang CLI as part of #17522 `bin/alluxio-bash formatWorker` -> `bin/cli cache format` `bin/alluxio-bash fs freeWorker` -> `bin/cli cache free --worker` `bin/alluxio-bash fs free` -> `bin/cli cache free [-f] --path` `bin/alluxio-bash fs persist` -> `bin/cli cache persist --path` `bin/alluxio-bash fs setTtl` -> `bin/cli cache ttl --set --duration --action` `bin/alluxio-bash fs unsetTtl` -> `bin/cli cache ttl --unset` pr-link: #17937 backcompat support for start/stop scripts skip mount args in start script backcompat remove unsupported cmds
Add
process
commands with multiple nodes to golang CLI as part of #17522bin/alluxio-start.sh masters
->bin/cli.sh process start masters
bin/alluxio-start.sh job_masters
->bin/cli.sh process start job_masters
bin/alluxio-start.sh workers
->bin/cli.sh process start workers
bin/alluxio-start.sh job_workers
->bin/cli.sh process start job_workers
bin/alluxio-start.sh proxies
->bin/cli.sh process start proxies
bin/alluxio-start.sh all
->bin/cli.sh process start all
bin/alluxio-start.sh local
->bin/cli.sh process start local
bin/alluxio-stop.sh masters
->bin/cli.sh process stop masters
bin/alluxio-stop.sh job_masters
->bin/cli.sh process stop job_masters
bin/alluxio-stop.sh workers
->bin/cli.sh process stop workers
bin/alluxio-stop.sh job_workers
->bin/cli.sh process stop job_workers
bin/alluxio-stop.sh proxies
->bin/cli.sh process stop proxies
bin/alluxio-stop.sh all
->bin/cli.sh process stop all
bin/alluxio-stop.sh local
->bin/cli.sh process stop local
For command
process start/stop
on multiple nodes, using crypto'sssh
package to create an SSH session, connect to masters, workers or all nodes, then send according subcommand on single nodes to these nodes.