Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process subcommands for multiple nodes #17887

Merged
merged 31 commits into from
Aug 8, 2023

Conversation

twalluxio
Copy link
Contributor

Add process commands with multiple nodes to golang CLI as part of #17522

bin/alluxio-start.sh masters -> bin/cli.sh process start masters
bin/alluxio-start.sh job_masters -> bin/cli.sh process start job_masters
bin/alluxio-start.sh workers -> bin/cli.sh process start workers
bin/alluxio-start.sh job_workers -> bin/cli.sh process start job_workers
bin/alluxio-start.sh proxies -> bin/cli.sh process start proxies
bin/alluxio-start.sh all -> bin/cli.sh process start all
bin/alluxio-start.sh local -> bin/cli.sh process start local

bin/alluxio-stop.sh masters -> bin/cli.sh process stop masters
bin/alluxio-stop.sh job_masters -> bin/cli.sh process stop job_masters
bin/alluxio-stop.sh workers -> bin/cli.sh process stop workers
bin/alluxio-stop.sh job_workers -> bin/cli.sh process stop job_workers
bin/alluxio-stop.sh proxies -> bin/cli.sh process stop proxies
bin/alluxio-stop.sh all -> bin/cli.sh process stop all
bin/alluxio-stop.sh local -> bin/cli.sh process stop local

For command process start/stop on multiple nodes, using crypto's ssh package to create an SSH session, connect to masters, workers or all nodes, then send according subcommand on single nodes to these nodes.

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • Supported title prefixes are: [WIP], [SMALLFIX], [DOCFIX]
  • Commits associated with Github account: PASS

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

@twalluxio twalluxio changed the title [In progress] Add process subcommands for multiple nodes Add process subcommands for multiple nodes Aug 4, 2023
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: PASS

All checks passed!

@twalluxio twalluxio marked this pull request as ready for review August 4, 2023 07:34
cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/job_masters.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/all.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/all.go Outdated Show resolved Hide resolved
subProcess := subProcess
err := subProcess.Start(cmd)
if err != nil {
log.Logger.Errorf("Error: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current behavior, if a group of processes fails to start (ex. one or more masters do not successfully start), does it continue to try to start the remaining processes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this behavior, if one behavior fail, it will continue to start the remaining processes, but give error messages. This is the same to the alluxio-start.sh version. If it needs immediate termination, I can change Errorf to Fatalf, which leads to immediate termination after single process failure.

cli/src/alluxio.org/cli/processes/all.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/job_masters.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/job_masters.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/job_masters.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/local.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Xenorith Xenorith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hard part to get it to work functionally is done, but there is still clean up work to make sure the code to scalable and extensible for others to understand and improve upon.

cli/src/alluxio.org/cli/processes/all.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/all.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/all.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/all.go Outdated Show resolved Hide resolved
if err != nil {
log.Logger.Errorf("Error reading worker hostnames at %s", mastersDir)
return nil, err
var cliPath = filepath.Join(env.Env.EnvVar.GetString(env.ConfAlluxioHome.EnvVar), "bin", "alluxio")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cliPath is dependent on env.ConfAlluxioHome.EnvVar to be properly set, but the environment variables are set in the beginning of the command via

	rootCmd.PersistentPreRunE = func(cmd *cobra.Command, args []string) error {
		if flagDebugLog {
			log.Logger.SetLevel(logrus.DebugLevel)
		}
		if err := env.InitAlluxioEnv(filepath.Clean(flagRootPath), flagIsDeployed); err != nil {
			return stacktrace.Propagate(err, "error defining alluxio environment")
		}
		return nil
	}

in launch.go

so this will currently return an empty string, but it will happen to work if you are running the cli from the home directory. so you should store bin/alluxio as a constant string, but each time you want to get the cli path, you'll have to define filepath.Join(env.Env.EnvVar.GetString(env.ConfAlluxioHome.EnvVar), binAlluxioPath)

cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Show resolved Hide resolved
// get list of masters and workers
var nodes []string
if onMasters {
if len(masterList) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another way is to move the hostname handling outside the runCommand:

func (p *FooProcess) Start(cmd *env.StartProcessCommand) error {
  command := addStartFlags("process start foo", cmd)
  // For master-only situations
  nodes := getNodes(filepath.Join(env.Env.EnvVar.GetString(env.ConfAlluxioConfDir.EnvVar), "masters"))
  // For masters & workers
  nodes = append(nodes, getNodes(filepath.Join(env.Env.EnvVar.GetString(env.ConfAlluxioConfDir.EnvVar), "workers"))...)
  runCommand(command, nodes)
}

in this case, it is no need to place the file parsing in a multi-threading function since it's not very time-consuming. so we can parse the file, then call the function with the whole node list.
avoid using the bool as a switch of two different functions and use the actual different parameters directly.


if len(errors) != 0 {
log.Logger.Warningf("Run command %s failed, number of failures: %v", command, len(errors))
return stacktrace.Propagate(errors[0], "First error: ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure if we need to log every error here or dump it to some file for debugging. but i'm agree with the current structure: handling the error list, and return a digest error for the caller.

cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Xenorith Xenorith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a lot better and easier to understand when compared to the initial review. a few more minor comments but otherwise it looks good

cli/src/alluxio.org/cli/env/env.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/job_masters.go Outdated Show resolved Hide resolved
cli/src/alluxio.org/cli/processes/common.go Show resolved Hide resolved
@twalluxio twalluxio requested a review from Xenorith August 8, 2023 02:28
Copy link
Contributor

@Xenorith Xenorith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all the new strings.Join(), join with space " " instead of empty string ""?

@Xenorith
Copy link
Contributor

Xenorith commented Aug 8, 2023

alluxio-bot, merge this please

@alluxio-bot
Copy link
Contributor

merge failed:
Merge refused because pull request does not have label start with type-

@Kai-Zhang Kai-Zhang added the type-feature This issue is a feature request label Aug 8, 2023
@Kai-Zhang
Copy link
Contributor

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit 8a8b338 into Alluxio:golangCli Aug 8, 2023
11 checks passed
Xenorith pushed a commit that referenced this pull request Aug 11, 2023
Add `process` commands with multiple nodes to golang CLI as part of #17522

`bin/alluxio-start.sh masters` -> `bin/cli.sh process start masters`
`bin/alluxio-start.sh job_masters` -> `bin/cli.sh process start job_masters`
`bin/alluxio-start.sh workers` -> `bin/cli.sh process start workers`
`bin/alluxio-start.sh job_workers` -> `bin/cli.sh process start job_workers`
`bin/alluxio-start.sh proxies` -> `bin/cli.sh process start proxies`
`bin/alluxio-start.sh all` -> `bin/cli.sh process start all`
`bin/alluxio-start.sh local` -> `bin/cli.sh process start local`

`bin/alluxio-stop.sh masters` -> `bin/cli.sh process stop masters`
`bin/alluxio-stop.sh job_masters` -> `bin/cli.sh process stop job_masters`
`bin/alluxio-stop.sh workers` -> `bin/cli.sh process stop workers`
`bin/alluxio-stop.sh job_workers` -> `bin/cli.sh process stop job_workers`
`bin/alluxio-stop.sh proxies` -> `bin/cli.sh process stop proxies`
`bin/alluxio-stop.sh all` -> `bin/cli.sh process stop all`
`bin/alluxio-stop.sh local` -> `bin/cli.sh process stop local`

For command `process start/stop` on multiple nodes, using crypto's `ssh` package to create an SSH session, connect to masters, workers or all nodes, then send according subcommand on single nodes to these nodes.
			pr-link: #17887
			change-id: cid-0a660de35945738c065dae2f008a61d48f6b3be9
twalluxio added a commit to twalluxio/alluxio that referenced this pull request Aug 14, 2023
Add `process` commands with multiple nodes to golang CLI as part of Alluxio#17522

`bin/alluxio-start.sh masters` -> `bin/cli.sh process start masters`
`bin/alluxio-start.sh job_masters` -> `bin/cli.sh process start job_masters`
`bin/alluxio-start.sh workers` -> `bin/cli.sh process start workers`
`bin/alluxio-start.sh job_workers` -> `bin/cli.sh process start job_workers`
`bin/alluxio-start.sh proxies` -> `bin/cli.sh process start proxies`
`bin/alluxio-start.sh all` -> `bin/cli.sh process start all`
`bin/alluxio-start.sh local` -> `bin/cli.sh process start local`

`bin/alluxio-stop.sh masters` -> `bin/cli.sh process stop masters`
`bin/alluxio-stop.sh job_masters` -> `bin/cli.sh process stop job_masters`
`bin/alluxio-stop.sh workers` -> `bin/cli.sh process stop workers`
`bin/alluxio-stop.sh job_workers` -> `bin/cli.sh process stop job_workers`
`bin/alluxio-stop.sh proxies` -> `bin/cli.sh process stop proxies`
`bin/alluxio-stop.sh all` -> `bin/cli.sh process stop all`
`bin/alluxio-stop.sh local` -> `bin/cli.sh process stop local`

For command `process start/stop` on multiple nodes, using crypto's `ssh` package to create an SSH session, connect to masters, workers or all nodes, then send according subcommand on single nodes to these nodes.
			pr-link: Alluxio#17887
			change-id: cid-0a660de35945738c065dae2f008a61d48f6b3be9
@twalluxio twalluxio deleted the golangCli-rsync3 branch August 14, 2023 04:05
Xenorith added a commit that referenced this pull request Aug 15, 2023
- Add cli/ folder containing golang code
- Add script to compile code to executable in build/cli/build-cli.sh
- Add entrypoint script for development in bin/cli.sh
- Add build profile to build the CLI as part of tha maven build via `-PgoCli`
- Add `conf` service, consisting of
  - `bin/cli.sh conf get` as an example command, equivalent to `bin/alluxio getConf`
  - `bin/cli.sh conf log --name fully.qualified.class.path`, equivalent to `bin/alluxio fsadmin logLevel --logName fully.qualified.class.path`

See #17522 for more background info

			pr-link: #17532

dev docs for defining conventions for the new alluxio cli in golang

			pr-link: #17530

- Add commands to start/stop individual processes, ex. `bin/cli.sh process start master`
- Define process interface and registry
  - process specific java opts are defined with the process and are dynamically added to env
- Add journal format command as part of starting master process
- Mount options for worker are not ported because they will be deprecated in the near future

			pr-link: #17561

Adds `info` subcommand, which contains:
- `cache`: worker capacity information, calls the existing `bin/alluxio fsadmin report capacity` command
- `collect`: collects cluster information into a single tarball, calls the existing `bin/alluxio collectInfo` command
- `master`: master quorum information, calls the existing `bin/alluxio fs masterInfo` command
- `report`: cluster information, calls the existing `bin/alluxio fsadmin report` command, excluding the `capacity` category

			pr-link: #17566

Add journal commands to CLI as part of #17522
			pr-link: #17569

Add quorum/HA related commands as part of #17522
			pr-link: #17570

Add fs commands to golang CLI as part of #17522

unlike other CLI commands, utilize arguments to maintain the existing structure of filesystem commands (ex. cp, du, ls, mv, rm, etc)

			pr-link: #17580

Add generate commands to golang CLI as part of #17522

bin/alluxio docGen -> bin/cli generate docs
bin/alluxio bootstrapConf -> bin/cli generate template

to read ALLUXIO_CONF_DIR, also conducted a minor change on env variables.

			pr-link: #17790

Add exec commands to golang CLI as part of #17522

`bin/alluxio runClass` -> `bin/cli exec class`
`bin/alluxio runTests` -> `bin/cli exec testRun`
`bin/alluxio runUfsTests` -> `bin/cli exec testUfs`
`bin/alluxio runUfsIOTest` -> `bin/cli exec testUfsIO`
`bin/alluxio runHdfsMountTests` -> `bin/cli exec testHdfsMount`
`bin/alluxio runHmsTests` -> `bin/cli exec testHms`
`bin/alluxio runJournalCrashTest` -> `bin/cli exec testJournalCrash`

For command `exec test`, using different `--name` flags can lead to huge difference in other flags and options.
To manage flags and options, currently using different commands for different test types.
			pr-link: #17797

Prepare golang cli for tarball build and replace bin/alluxio

allow java 8 or 11

add cli binaries to other tarball profiles

Add `process` commands with multiple nodes to golang CLI as part of #17522

`bin/alluxio-start.sh masters` -> `bin/cli.sh process start masters`
`bin/alluxio-start.sh job_masters` -> `bin/cli.sh process start job_masters`
`bin/alluxio-start.sh workers` -> `bin/cli.sh process start workers`
`bin/alluxio-start.sh job_workers` -> `bin/cli.sh process start job_workers`
`bin/alluxio-start.sh proxies` -> `bin/cli.sh process start proxies`
`bin/alluxio-start.sh all` -> `bin/cli.sh process start all`
`bin/alluxio-start.sh local` -> `bin/cli.sh process start local`

`bin/alluxio-stop.sh masters` -> `bin/cli.sh process stop masters`
`bin/alluxio-stop.sh job_masters` -> `bin/cli.sh process stop job_masters`
`bin/alluxio-stop.sh workers` -> `bin/cli.sh process stop workers`
`bin/alluxio-stop.sh job_workers` -> `bin/cli.sh process stop job_workers`
`bin/alluxio-stop.sh proxies` -> `bin/cli.sh process stop proxies`
`bin/alluxio-stop.sh all` -> `bin/cli.sh process stop all`
`bin/alluxio-stop.sh local` -> `bin/cli.sh process stop local`

For command `process start/stop` on multiple nodes, using crypto's `ssh` package to create an SSH session, connect to masters, workers or all nodes, then send according subcommand on single nodes to these nodes.
			pr-link: #17887

Add job commands to golang CLI as part of #17522

`bin/alluxio-bash job cancel id` -> `bin/alluxio job cancel --id`
`bin/alluxio-bash job leader` -> `bin/alluxio job leader`
`bin/alluxio-bash job ls` -> `bin/alluxio job list`
`bin/alluxio-bash job getCmdStatus jobControlId` -> `bin/alluxio job cmdStatus --id`
`bin/alluxio-bash job stat [-v] id` -> `bin/alluxio job jobStatus [-v] --id`
`bin/alluxio-bash fs distributedCp src dst` -> `bin/alluxio job submit --type cp --src --dst`
`bin/alluxio-bash fs distributedMv src dst` -> `bin/alluxio job submit --type mv --src --dst`
`bin/alluxio-bash fs load path --submit` -> `bin/alluxio job load --path`

			pr-link: #17931

Clean up multiprocess code

- refactor commands launching the same process on multiple hosts into a single struct
- simplify the logic to ssh into different hosts with more straightforward handling of error channel
- move names.go under its own package to avoid import cycles
			pr-link: #17947

Add an `info version` subcommand to golang CLI as part of #17522

`bin/alluxio-bash version` -> `bin/alluxio info version`
			pr-link: #17943

update calls to old bash scripts to refer to alluxio-bash

Add `cache` commands to golang CLI as part of #17522
`bin/alluxio-bash formatWorker` -> `bin/cli cache format`
`bin/alluxio-bash fs freeWorker` -> `bin/cli cache free --worker`
`bin/alluxio-bash fs free` -> `bin/cli cache free [-f] --path`
`bin/alluxio-bash fs persist` -> `bin/cli cache persist --path`
`bin/alluxio-bash fs setTtl` -> `bin/cli cache ttl --set --duration --action`
`bin/alluxio-bash fs unsetTtl` -> `bin/cli cache ttl --unset`
			pr-link: #17937

backcompat support for start/stop scripts

skip mount args in start script backcompat

remove unsupported cmds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature This issue is a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants