Allow task and fleet updates over ROS 2 #1003

aaronchongth · 2024-08-22T09:25:08Z

What's new

Sets up ROS 2 subscriptions for fleet and task related updates, that were solely on websockets _internal route in the past.

The motivation is related to #899, where non-reproducible issues with websockets occur unpredictably in production environments, causing various systems to fail.

This allows the option to mitigate issues with the upstream websocketpp library, and just rely on ROS 2 to publish updates instead.

Works with open-rmf/rmf_ros2#383

Self-checks

I have prototyped this new feature (if necessary) on Figma
I'm familiar with and follow this Typescript guideline
I added unit-tests for new components
I tried testing edge cases
I tested the behavior of the components that interact with the backend, with an e2e test

Signed-off-by: Aaron Chong <[email protected]>

koonpeng

It's still not clear where the connection issues come from, there is a chance that upgrading dependencies would have fixed it. There is also a chance that this doesn't fix it as we have seen ros2 messages being significantly delayed before as well.

That said, I'm not opposed to using ros2, but I think we should only receive updates from a single source, there should be an option to choose which source to use.

aaronchongth · 2024-08-28T03:33:00Z

I fully agree that focusing on a single transport mode would be the cleanest way to handle it.

For updating dependencies, we could be limited to half of the equation only, we could update rmf-web as we go, but as the cpp stack uses https://github.com/zaphoyd/websocketpp, which is generally stable but won't be having any updates any time soon.

There is also a chance that this doesn't fix it as we have seen ros2 messages being significantly delayed before as well.

It does not resolve the root cause of the issue, but that could actually be infrastructure for all we know and therefore there is nothing we can do to properly diagnose and provide a solution for. However allowing the option to use ROS 2 will at least circumvent the symptoms of stalling all the components that use websockets (API server, fleet adapter, task dispatcher), and require manual restarting of these components in a production environment.

That said, I'm not opposed to using ros2, but I think we should only receive updates from a single source, there should be an option to choose which source to use.

I agree with this too, however we might have situations where there are fleet adapters running on the same system as the API server (also encountering websocket connection issues) and could benefit from rmw shared memory transport by using ROS 2, whereas there are fleet adapters running in the cloud somewhere that would like to use websockets instead.

The fleet adapters will only have 1 mode of transport to choose from either websockets or ROS 2, but not both, and we can plaster as many warnings or deprecation notices as we want on the ROS 2 endpoints

koonpeng · 2024-08-28T09:07:09Z

The fleet adapters will only have 1 mode of transport to choose from either websockets or ROS 2, but not both, and we can plaster as many warnings or deprecation notices as we want on the ROS 2 endpoints

I see that there is multiple way this can play out.

If the fleet adapters offer both transport simultaneously, it will cause issues if we also listen on both transport. But if we only listen on 1 transport, then there would be no issue.
If the fleet adapters only offer one transport at once, then listening on both ws and ros2 gives the advantage of supporting a mix of ws and ros2 adapters.

I think there is value in supporting both transport simultaneously so I prefer going with 1.

koonpeng · 2024-08-28T09:08:59Z

packages/api-server/api_server/gateway.py

+
+            async def save_alert(alert_request: AlertRequest):
+                try:
+                    created_alert = await self._alert_repo.create_new_alert(
+                        alert_request
+                    )
+                except AlreadyExistsError as e:
+                    self._logger.error(e)
+                    return
+                self._alert_events.alert_requests.on_next(created_alert)
+
+            if task_state.status == TaskStatus.completed:
+                alert_request = AlertRequest(
+                    id=str(uuid4()),
+                    unix_millis_alert_time=round(datetime.now().timestamp() * 1000),
+                    title="Task completed",
+                    subtitle=f"ID: {task_state.booking.id}",
+                    message="",
+                    display=True,
+                    tier=AlertRequest.Tier.Info,
+                    responses_available=["Acknowledge"],
+                    alert_parameters=[],
+                    task_id=task_state.booking.id,
+                )
+                self._loop.create_task(save_alert(alert_request))
+            elif task_state.status == TaskStatus.failed:
+                errorMessage = ""
+                if (
+                    task_state.dispatch is not None
+                    and task_state.dispatch.status == DispatchStatus.failed_to_assign
+                ):
+                    errorMessage += "Failed to assign\n"
+                    if task_state.dispatch.errors is not None:
+                        for error in task_state.dispatch.errors:
+                            errorMessage += error.json() + "\n"
+
+                alert_request = AlertRequest(
+                    id=str(uuid4()),
+                    unix_millis_alert_time=round(datetime.now().timestamp() * 1000),
+                    title="Task failed",
+                    subtitle=f"ID: {task_state.booking.id}",
+                    message=errorMessage,
+                    display=True,
+                    tier=AlertRequest.Tier.Error,
+                    responses_available=["Acknowledge"],
+                    alert_parameters=[],
+                    task_id=task_state.booking.id,
+                )
+                self._loop.create_task(save_alert(alert_request))
+
+        task_state_update_sub = self._ros_node.create_subscription(
+            StringMsg, "task_state_update", handle_task_state_update, 10
+        )
+        self._subscriptions.append(task_state_update_sub)
+
+        def handle_task_log_update(msg):
+            msg = cast(StringMsg, msg)
+            json_msg = json.loads(msg.data)
+
+            async def save(task_event_log: TaskEventLog):
+                await self._task_repo.save_task_log(task_event_log)
+                self._task_events.task_event_logs.on_next(task_event_log)
+
+            self._loop.create_task(save(TaskEventLog.model_validate(json_msg["data"])))
+
+        task_log_update_sub = self._ros_node.create_subscription(
+            StringMsg, "task_log_update", handle_task_log_update, 10
+        )
+        self._subscriptions.append(task_log_update_sub)
+
+        def handle_fleet_state_update(msg):
+            msg = cast(StringMsg, msg)
+            json_msg = json.loads(msg.data)
+
+            async def save(fleet_state: FleetState):
+                await self._fleet_repo.save_fleet_state(fleet_state)
+                self._fleet_events.fleet_states.on_next(fleet_state)
+
+            self._loop.create_task(save(FleetState.model_validate(json_msg["data"])))
+
+        fleet_state_update_sub = self._ros_node.create_subscription(
+            StringMsg, "fleet_state_update", handle_fleet_state_update, 10
+        )
+        self._subscriptions.append(fleet_state_update_sub)
+
+        def handle_fleet_log_update(msg):
+            msg = cast(StringMsg, msg)
+            json_msg = json.loads(msg.data)
+
+            async def save(fleet_log: FleetLog):
+                await self._fleet_repo.save_fleet_log(fleet_log)
+                self._fleet_events.fleet_logs.on_next(fleet_log)
+
+            self._loop.create_task(save(FleetLog.model_validate(json_msg["data"])))
+
+        fleet_log_update_sub = self._ros_node.create_subscription(
+            StringMsg, "fleet_log_update", handle_fleet_log_update, 10
+        )
+        self._subscriptions.append(fleet_log_update_sub)
+


Please refactor so both the internal route and this share the same code.

aaronchongth · 2024-08-28T09:54:03Z

With the discussion to allow publishing over ROS 2 regardless of the availability of websockets, open-rmf/rmf_ros2#383, we will inevitably run into issues where the API server will be receiving duplicated information if a locally running fleet adapter is provided websockets access.

Apologies for the back-and-forth, we can retain the current behavior of using websockets only on rmf-web for now, until there is a more tangible requirement for supporting both websockets and ROS 2.

For now, with the fleet adapters and task dispatcher publishing over ROS 2 by default, since the original motivation is to provide a workaround for some obscure issue, users/collaborators can reference this PR to implement the changes they require for their deployments.

Closing this PR

Set up ROS 2 subs for updates

3842e8d

Signed-off-by: Aaron Chong <[email protected]>

aaronchongth mentioned this pull request Aug 22, 2024

Publish fleet and task updates over ROS 2 if websocket is not provided open-rmf/rmf_ros2#383

Open

koonpeng requested changes Aug 23, 2024

View reviewed changes

koonpeng requested changes Aug 28, 2024

View reviewed changes

aaronchongth closed this Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow task and fleet updates over ROS 2 #1003

Allow task and fleet updates over ROS 2 #1003

aaronchongth commented Aug 22, 2024

koonpeng left a comment

aaronchongth commented Aug 28, 2024

koonpeng commented Aug 28, 2024

koonpeng Aug 28, 2024

aaronchongth commented Aug 28, 2024

Allow task and fleet updates over ROS 2 #1003

Allow task and fleet updates over ROS 2 #1003

Conversation

aaronchongth commented Aug 22, 2024

What's new

Self-checks

koonpeng left a comment

Choose a reason for hiding this comment

aaronchongth commented Aug 28, 2024

koonpeng commented Aug 28, 2024

koonpeng Aug 28, 2024

Choose a reason for hiding this comment

aaronchongth commented Aug 28, 2024