-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-Machine Launching #255
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments, though it looks like it's very early in the design process.
|
||
## Justification | ||
|
||
Nodes can need to run on different hosts for a variety of reasons. Some possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlanting can need
? would may need
sound better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"May need" sounds better to me
## Justification | ||
|
||
Nodes can need to run on different hosts for a variety of reasons. Some possible | ||
use cases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlanting nit:
use cases: | |
use cases are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or "Some possible use cases include:".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will is right, "include" sounds better.
Nodes can need to run on different hosts for a variety of reasons. Some possible | ||
use cases: | ||
|
||
- A large robot could with hosts located physically near the hardware they are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlanting hmm, not sure what you mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, on a vehicle that has many different types of sensors (cameras, lidars, radars, etc.), you may want to have a separate computer for processing the data from each one of those sets of sensors, and in order to simplify cable routing and network design, you may want to have those computers physically positioned next to them.
Alternately, depending on the chassis of your vehicle, you could have limitations on where it's actually possible to place computers, and so they could be scattered around by necessity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't understand, seems like a typo or something as "could with hosts" doesn't parse for me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- A large robot could with hosts located physically near the hardware they are | |
- A large robot could require hosts located physically near the hardware they are |
@wjwwood Is this sufficient to make the explanation clearer? We could go further by describing the physical layout more, possibly talk about doing edge processing for sensors when you have a robot that's more than a few meters in overall size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you guys are going for a specifically multi-robot/multi-agent approach, we should probably adopt language that suits the field.
- A large robot could with hosts located physically near the hardware they are | |
- A robot could request from neighboring static or dynamic agents control of their hardware, such as cameras, sensors, etc. |
- Calibration data, map files, training data, etc. | ||
- Need to keep track of which machine has the most recent version of such | ||
resources | ||
- Security: we'll need to manage credentials across numerous machines both for SSH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlanting nit:
- Security: we'll need to manage credentials across numerous machines both for SSH | |
- Security credentials will have to be managed across numerous machines both for SSH |
|
||
## Proposed Multi-Machine Launch Command Line Interface | ||
|
||
The multi-machine launching interface is controlled through the `launcher` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlanting meta: I wonder if launcher
is the right naming, as opposed to host
, machine
or even system
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the best thing I could think of at the time -- I am also not a big fan of it and an open to suggestions. :-)
I think it would make sense to provide the same interface for both single-machine and multi-machine launching and also make it obvious from the verb what the command does. system
might work for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
launcher
is just a new command we decided to propose since the existing launch
command doesn't have any verbs. Running something like ros2 launcher launch
would do the same thing as ros2 launch
, either by calling the existing launch, or replacing it (in which case ros2 launch
could just be an alias).
This is basically just a workaround for the fact that the existing launch command doesn't have verbs and we weren't sure it'd be a good idea to try to introduce verbs to a verbless command.
$ ros2 launcher list | ||
ab1e0138-bb22-4ec9-a590-cf377de42d0f: 5 nodes, 2 hosts | ||
50bda6fb-d451-4d53-8a2b-e8fcdce8170b: 2 nodes, 1 host | ||
5d186778-1f50-4828-9425-64cc2ed1342c: 16 nodes, 3 hosts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlanting does each launch file provide enough information to dispatch each node to the right host? If so, how does this cope with a single-machine launch to be launched in multiple hosts, unaware of each other? If not, how are nodes or even "systems" as you propose here associated with each host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a theoretical design of how we would expect the CLI to look like, so there isn't a definite answer for those questions (yet). I expect that the launch files will, similar to ROS1's launch files, contain information about all of the hosts involved and which nodes should be launched on which host.
If so, how does this cope with a single-machine launch to be launched in multiple hosts, unaware of each other?
I'm not sure what this means, do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that nodes which need to run on a certain machine can have that machine specified in the launch file, and those that do not have a machine specified would be sent to a host selected by the launch system (to allow for things like load balancing). The launch system would be informed of hosts by something like a "declare_launch_hosts_action"
Are there any lessons or infrastructure we'd like to integrate/piggyback with? A long while ago we had a similar idea of integrating orchestration software to manage roslaunch for ROS1: ros/ros_comm#643 Rather than reinventing the wheel, perhaps we could open up an interface to make it simpler to plugin roslaunch with kubernetes, swarm, nomad, docker, or other orchestration models outside containers? |
I agree that it would be good to build this on top of an existing orchestration tool. It's not clear to me from the document, if in the same launch file you would be able to run processes in different machines. That was possible in ROS 1 using the |
Thanks for the contribution @mlanting Here are my suggestions:
$ ros2 launch [package_name [launch_file_name] [launch_arguments]] That or specifying a yaml file configuration with a
|
|
||
Authors: {{ page.author }} | ||
|
||
## Purpose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on the outline.
|
||
## Proposed Multi-Machine Launch Command Line Interface | ||
|
||
The multi-machine launching interface is controlled through the `launcher` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason behind adding a whole new verb as opposed to extending the previous launch command? (Again, see my comments for proposed alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, I think it'd be good to do this in a way that doesn't break backwards-compatibility, so just running ros2 launch pkg launch_file
should behave the same way as it always has. On the other hand, none of the current ros2
verbs currently can both execute a command and have subcommands, so to do so here would be breaking with existing conventions.
The thought here is that the cleanest way to preserve backwards compatibility and preserve conventions is to add a new command that provides all of the necessary subcommands and have one of those subcommands provide the same functionality as the original ros2 launch
. Do you think that either replacing the current launch
command or having one command that both performs an action and has sub-commands would be preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docker has done something similar, like where docker images
was extended to docker image list
, or docker ps
vs docker container list
. So we could nest the original verb into the new syntax, while keeping the old syntax. Though I'm a fan of reworking the launch verb as its already singlar. Perhaps ros2 launch
-> ros2 launch start|stop|restart <pkg> <file>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually a little more:
Wouldn't it be better to extend ros2launch with the ability to read a list package_name/launch_files and pass the attach/detach args is needed:
$ ros2 launch [package_name [launch_file_name] [launch_arguments]]
You may need more information than just a package and launch file name attach
/ term
a launched system; in fact, that will definitely not be enough if it is possible to run the same launch file more than once on a network. For example, imagine you've got a lab of developers, all of whom are on the same network and are independently launching/terminating nodes and don't want to mess up anybody else.
In fact, not needing to know the package/launch file could be a desirable feature; you might want to be able to launch a system from one machine but stop it from a different one.
Actually, it looks like it didn't make it into the revision of this file that got pushed here, but here's an example of some more verbose output from list -v
that could be useful for introspecting running systems:
$ ros2 launcher list -v
ab1e0138-bb22-4ec9-a590-cf377de42d0f: 5 nodes, 2 hosts
Launch host: 192.168.10.5
Launch time: Fri Sep 13 15:39:45 CDT 2019
Launch command: ros2 launcher launch package_foo bar.launch.py argument:=value
4c9dc7d1-e6c4-49cd-bc3d-c8aa2d5a95e0: 5 nodes, 2 hosts
Launch host: 192.168.10.5
Launch time: Fri Sep 13 16:39:45 CDT 2019
Launch command: ros2 launcher launch package_foo bar.launch.py argument:=value
50bda6fb-d451-4d53-8a2b-e8fcdce8170b: 2 nodes, 1 host
Launch host: 192.168.10.15
Launch time: Fri Sep 13 12:39:45 CDT 2019
Launch command: ros2 launcher launch demo_nodes_cpp talker_listener.launch.py
5d186778-1f50-4828-9425-64cc2ed1342c: 16 nodes, 3 hosts
Launch host: 192.168.10.13
Launch time: Fri Sep 12 10:39:45 CDT 2019
Launch command: ros2 launcher launch package_foo bar2.launch.py
$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a command to start/stop/restart launched things? Can we not just keep ros2 launch
blocking as it is, and you simply ctrl-c the original command when you want to stop it?
If you want to list existing launch services, then we can add that as a ros2 launch --list-launch-services
kind of thing. If you absolutely need to asynchronously interact with an already running launch service then the command name that would make more sense to me is ros2 launchctrl
...
By the way, the architecture of the launch system is not at all clear at this point in the article, how many processes are there, how are things executed on remote machines, is it a blocking process or a daemon?
Without context on the architecture, this section about the command line tools doesn't make sense and is completely unmotivated. It just comes off as a bit arbitrary. I designed launch originally, and have discussed this feature with people extensively and even I couldn't really guess at why the tools were proposed like this. Don't take this as negative criticism, it's fine we'll iterate on it, but my point is that you need a lot more supporting rationale before you jump into how the command line tools will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought behind this is that since ROS2 has no rosmaster and can operate in a decentralized fashion, it makes sense for a multi-machine launch system to match that paradigm rather than having a central process around which everything can fail.
This is actually a minor annoyance I've dealt with a lot in ROS1. Let's say your robot is a vehicle that has multiple headless computers, you have a laptop that you use for interacting with the vehicle but does not have any critical software running on it, and you want to launch a ROS system across all of them. Your options are:
roslaunch
everything from your laptop, which creates an unnecessarily single point of failure that can bring everything downssh
into a computer on the vehicle, start a terminal manager likescreen
ortmux
,roslaunch
everything, then detach and log out; when you want to stop everything, you then have to log in, attach, kill it, then log out again
The first case can unexpectedly cause big problems, such as one time that we had a demo running on a vehicle and a bump in the road caused the laptop to close, go to sleep, and then the whole system died. The second case is less problematic but adds several repetitive steps. You can abstract away the process by writing services and scripts to automate everything, but to me that feels like something the launch system should be able to do.
In addition, ROS2's decentralized design means it's possible to have machines that lose and regain connectivity to each other and continue functioning. This can happen accidentally as a result of network failures, but it could also be intentional in the case of a swarm of drones that move out of and back into range of each other. If the launch paradigm is to have a central launch process that shuts everything down when it's interrupted, how do you handle shutting down hosts that you've temporarily lost connection to? After they've reconnected to the network, is there even a way to do so?
Designing the launch system in this way addresses those problems; you can:
- Launch a ROS2 system from a machine that is not a critical part of the system
- Monitor or stop groups of nodes that were started on a different machine than the current one
- Monitor or stop nodes that disconnected from the network they were launched from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, we have some ideas and plans for the actual launch architecture but are still pretty early in the design process; the desire behind designing the user interface first is to figure out what features we want and how people want to interact with it and then build an architecture that is capable of that rather than creating a system first and then trying to figure out how to control it in an intuitive way.
As mentioned in another comment, the desire behind creating a new command (and I like launchctrl
better than launcher
) was to preserve compatibility with the existing launch
command while also not breaking the current paradigm of commands that perform a single action not also having sub-commands. If you don't have any problem with adding sub-commands to launch
or with tweaking launch
's arguments so that is is capable of other actions, I also think that's preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
I think all of this is important to outline in the design doc as it provides a lot of necessary context and justification as to why one should go about implementing this new feature.
I agree that leveraging the decentralized paradigm of ROS2 to launch processes on various systems and monitor them from other systems is useful.
To move forward, I'd suggest having very specific, actionable items of what this proposition will and will not achieve as well as justification as to why previous commands/features don't achieve what your proposing (As you outlined to wjwwood).
Edit:
Just wanted to say this is hilarious and sorry it happened 😆 (I know the pain...)
The first case can unexpectedly cause big problems, such as one time that we had a demo running on a vehicle and a bump in the road caused the laptop to close, go to sleep, and then the whole system died.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as to why previous commands/features don't achieve what your proposing
I would also like to see some arguments on why existing orchestration tools just sitting on top of ROS2 launch already would not be as suitable, and to what extent we should implement distributed host process management into roslaunch itself. The unix side of me isn't sure if ROS is self would be the best place to add this kind of infrastructure management. There's a lot more to distributed process management for robots that roslaunch, such and rotating logs, key exchange/enrolment, port mapping, health monitoring, etc. For example, I don't even think using SSH is still appropriate for managing swarms of robots, given how eazy ssh pipes break over spotty wifi connections; even something more robust like most isn't well purposed for programmatic process management.
However, It might be neat to use ROS as the transport for other orchestration tools, eg. where kubernetes or nomad could send client/server exchanges over DDS; though I'm not sure if the'd operate under anything but a reliable transport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mosh might be a better because it was made for intermittent connectivity.
For the attach/detach feature, couldn't you just extend the ros2 node ...
command to include an attach/detach action (i.e. ros2 node attach name-of-node
)?
$ ros2 launcher term 50bda6fb-d451-4d53-8a2b-e8fcdce8170b | ||
Terminating Launch System 50bda6fb-d451-4d53-8a2b-e8fcdce8170b. | ||
$ | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to resolve this comment: EOF newlines are present unless stated otherwise in github viewer
$ | ||
``` | ||
|
||
#### `term` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These features all seem like extensions that can extend the current ros2run
and ros2launch
CLI.
They are also something I would like to see in ROS2 as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would possibly be a useful feature of ros2 launch
(not sure about ros2 run
, but maybe...), but again I don't see what it has to do with multi-machine launch. We could add attach
/detach
/term
without adding multi-machine launching actions, and vice versa (unless I miss some core reason for it being included here).
Providing the ability to work on top of an existing orchestration tool could be very useful, but I'm hesitant about requiring it. Adding something like docker or kubernetes is a very significant dependency that I know some people won't want to be required as part of their ROS installation, and they may not be feasible for some low-resource platforms or available for some architectures. In ROS1 systems, I've done multi-machine launching in docker environments through abuse of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for stating the process on this!
I have to say though I think we need a lot more details on what the system will be capable of, how it might be implemented (ssh vs. a daemon needs to be running on each machine ahead of time vs. something else), and how users will specify what is local and what is remote and any limitations on what can be remote.
Rather than, what I take to be, unrelated features surrounding detachable launch services.
Also, please update this section as part of this pull request, to at least point to this new article:
|
||
Allow a system of ROS nodes to be launched on a hardware architecture that is | ||
spread across multiple networked computers and facilitate introspection and | ||
management of the system from a single machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use one sentence per line according to our MD style:
https://index.ros.org/doc/ros2/Contributing/Developer-Guide/#markdown-restructured-text-docblocks
## Justification | ||
|
||
Nodes can need to run on different hosts for a variety of reasons. Some possible | ||
use cases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or "Some possible use cases include:".
Nodes can need to run on different hosts for a variety of reasons. Some possible | ||
use cases: | ||
|
||
- A large robot could with hosts located physically near the hardware they are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't understand, seems like a typo or something as "could with hosts" doesn't parse for me...
- Connecting to a remote host and running nodes on it | ||
- Pushing configuration parameters for nodes to remote hosts | ||
- Monitoring the status and tracking the lifecycle of nodes | ||
- Recovering from failures by optionally restarting nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two (tracking lifecycle and recovering from failures) are not special for distributed launching, imo. They have to be done in a single launch file as well some times.
- Load balancing nodes on distributed networks | ||
- Command line tools for managing and monitoring systems across machines | ||
- Mechanisms for locating files and executables across machines | ||
- Sharing and synchronizing files across machines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the motivation of these features here.
I think things should only be included in the design document in one of two cases:
- it's a feature that has a concrete/existing use case (e.g. feature we had in ROS 1 and want to emulate)
- it's a feature we might want in the future, but needs to be considered now so that it's possible to add it later
- i.e. trying to avoid designing ourselves into a box where adding the new feature would require major redesigning of the system
It's not clear to me that each of those meet that standard, but if they do then I think they need to be separately motivated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. Some of these such as Command line tools for managing and monitoring systems across machines
is probably best suited by a community package than one provided by ros2.
Additionally, what is mean by "load balancing" in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my opinion, i believe
- Mechanisms for locating files and executables across machines
- Sharing and synchronizing files across machines
these are totally off tipic from launch system, why this is integrated into launcher? could we think about these features more generic if necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I would say they're "totally off topic"; synchronizing a built workspace between machines is something I've had to deal with constantly when launch on multi-machine systems, and I've spent a considerably amount of time writing scripts to handle efficiently deploying large workspaces (I've got one that's 5.3 GB right now) across multiple ROS hosts.
But it is true that the concept of synchronizing files isn't tightly coupled to launching; there are people who will be interested in multi-machine launching but don't need to sync anything, and there are probably others who will want to sync data but aren't launching anything, so it might make more sense to break that out into its own system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think we need to consider what constitutes things for multi-machine launching vs. tools for multi-machine ros2 setups. Our best bet might be to consolidate the features that would help multi-machine launching in the launch
command within this guide and then think about maybe a suite of packages containing features helpful to multi-machine setups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ROS 1, files were looked up via something like "find file X in package Y", and in multi machine launching it was required that packages and files existed in both machines (maybe not in the same place, but they had to be discoverable).
That seemed to work ok, though obviously there are situations, especially in development, where this isn't ideal. However, I do think we'd do well to keep the tools "small and sharp" and deal with this problem outside of launch. If there appears to be a really good default way to solve this problem, then we can consider adding tool support.
However, I'd encourage you guys to get it working with the assumption that all the packages existing on the local and remote machines (or in the "remote" containers), and then we can workout mechanisms for synchronization or container setup as needed.
|
||
## Proposed Multi-Machine Launch Command Line Interface | ||
|
||
The multi-machine launching interface is controlled through the `launcher` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a command to start/stop/restart launched things? Can we not just keep ros2 launch
blocking as it is, and you simply ctrl-c the original command when you want to stop it?
If you want to list existing launch services, then we can add that as a ros2 launch --list-launch-services
kind of thing. If you absolutely need to asynchronously interact with an already running launch service then the command name that would make more sense to me is ros2 launchctrl
...
By the way, the architecture of the launch system is not at all clear at this point in the article, how many processes are there, how are things executed on remote machines, is it a blocking process or a daemon?
Without context on the architecture, this section about the command line tools doesn't make sense and is completely unmotivated. It just comes off as a bit arbitrary. I designed launch originally, and have discussed this feature with people extensively and even I couldn't really guess at why the tools were proposed like this. Don't take this as negative criticism, it's fine we'll iterate on it, but my point is that you need a lot more supporting rationale before you jump into how the command line tools will work.
|
||
## Proposed Multi-Machine Launch Command Line Interface | ||
|
||
The multi-machine launching interface is controlled through the `launcher` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, why do I need to attach/detach? Just because docker has it? Do we really need that feature?
#### `launch` | ||
|
||
The `ros2 launcher launch` is equivalent to `ros2 launch`, which is preserved | ||
for backwards compatibility and ease of use. It is used to run a launch file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been stated above, but just to drive it home here where it's clearer, why not just ros2 launch
?
What is the technical justification for needing a new sub-verb inside a new verb if they behave the same?
across a network. | ||
|
||
Additionally, it is possible to detach from a system and let it run in the | ||
background: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this just a feature of the ros2 launcher launch
version of this? Why is it needed at all and why can it not be done for ros2 launch
as well? Put another way, what is it about multi machine launch that requires this ability?
$ | ||
``` | ||
|
||
#### `term` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would possibly be a useful feature of ros2 launch
(not sure about ros2 run
, but maybe...), but again I don't see what it has to do with multi-machine launch. We could add attach
/detach
/term
without adding multi-machine launching actions, and vice versa (unless I miss some core reason for it being included here).
+1, I'm totally on board with this, just wanted to bring up the fact that there may be similar implementations that we can leverage so we don't reinvent the wheel. As before, I suggest writing this out in the design doc as a background/justification as it explains why such a tool is needed. |
processing hardware | ||
- A robot with a cluster of machines that do distributed processing of data | ||
- A network of multiple virtual hosts for testing purposes | ||
- A swarm of independent drones that can cooperate but do not require |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- A swarm of independent drones that can cooperate but do not require | |
- A swarm of cooperative, independent agents that do not require intercommunication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why limit to a swarm and not a team of more than 1 agent?
How is agent intercommunication relevant to multi-machine launching? And why limit to systems without it?
Why not just: Cooperative Multi-Agent Systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a specific use case example. I don't think it was meant in an all-encompassing manner for multi-machine launching.
- A robot with a cluster of machines that do distributed processing of data | ||
- A network of multiple virtual hosts for testing purposes | ||
- A swarm of independent drones that can cooperate but do not require | ||
communication with each other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing this line
communication with each other |
|
||
- Connecting to a remote host and running nodes on it | ||
- Pushing configuration parameters for nodes to remote hosts | ||
- Monitoring the status and tracking the lifecycle of nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is exactly the difference from lifecycle command?
-h, --help Show this help message and exit. | ||
-d, --debug Put the launch system in debug mode, provides more | ||
verbose output. | ||
-D, --detach Detach from the launch process after it has started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would use if launch process kills itself after launch. after the launch just to leave everything else. for embedded platform, we might not need launch process, but we likely to use launch description to init the system.
I tried pushing an update a couple days ago to my fork, but it doesn't seem to be coming through. Possibly because the roslaunch branch was merged into gh-pages and then removed a few days after I originally submitted this PR? |
I'm not sure, but that's probably the case. |
Signed-off-by: matthew.lanting <[email protected]>
Signed-off-by: matthew.lanting <[email protected]>
Signed-off-by: matthew.lanting <[email protected]>
- Added a 'Context' section to describe the multi-machine launch capabilities of ROS1 and point to remote launch section of the main launch design document. - Changed linebreaks to one sentence per line. - Added a 'Proposed Approach' section where we can start to describe our design(s) from a technical perspective. - Added a 'Goals' section. Not much useful content here yet though. Signed-off-by: matthew.lanting <[email protected]>
- Also did a bit more reformatting/reworking of some of the main sections based on some of the feedback we've gotten so far Signed-off-by: matthew.lanting <[email protected]>
Distro A, OPSEC Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC Signed-off-by: Jacob Hassold <[email protected]>
Distribution Statement A; OPSEC #2893 Signed-off-by: P. J. Reed <[email protected]>
Distribution Statement A; OPSEC #2893 Signed-off-by: matthew.lanting <[email protected]>
Distribution Statement A; OPSEC #2893 Signed-off-by: P. J. Reed <[email protected]>
Distribution Statement A; OPSEC #2893 Signed-off-by: matthew.lanting <[email protected]>
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2020-07-16/15468/1 |
We've created a new version of the design document with some much more concrete ideas to discuss, but since the document has changed entirely I figured it'd be more appropriate to create a new PR: #297 |
This is an initial draft of a design document outlining our plans for adding multi-machine launching capability to ros launch for ROS2. We'd like to put it out early in the design process so we can take community feedback into account as we move forward.
Distribution Statement A; OPSEC #2893