Improve diagnostics aggregation method #3829

isamu-takagi · 2023-05-26T01:54:25Z

Checklist

I've read the contribution guidelines.
I've searched other issues and no duplicate issues were found.
I've agreed with the maintainers that I can plan this task.

Description

Recently MRM supports the multiple behavior that are emergency stop and comfortable stop and will plan to support pull over in #3221. Therefore, the system must select an appropriate MRM from the diagnostics.

Current diagnostics do not have well-structured diagnostic trees, making it difficult to select a appropriate MRM. Autonomous driving systems are complex, so diagnostic dependencies are directed acyclic graph (DAG) rather than tree.

For users, there is the problem that it is difficult to understand the error that prevents the switch to automatic driving, and thare are false positives. Also, I feel that terms such as SPF/LF/SF are a little different from their original meanings, so I would like to improve these as well.

Purpose

Select appropriate MRM when detecting errors.
Filter errors appropriately according to vehicle status and user interests.
Supports DAG diagnostics dependencies.
Supports diagnostics for redundant system.

Possible approaches

The diagram below is the sample of an improved diagnostic structure. Each block in the diagram has the same information as DiagnosticStatus.msg with extra "links" field for DAG. The mechanism is almost the same as diagnostic_aggregator. So we need to extend it to support DAG, or create a similar implementation.

Error classification

Now the system_error_monitor classifies each status as either SPF/LF/SF/NF. However, the severity of the impact depends on the functionality that requires it. For example, consider a situation where the system has LiDAR and radar, and only LiDAR is broken. The object detection can work if either is fine, so that is not an error. But the object prediction, which needs the exact position of an object, requires LiDAR, so that is an error.

So instead of categorizing each error, consider the impact on the functionality that each error affects. In this case it is sufficient to create a DiagnosticStatus for that functionality and determine its level. In conclusion, remove the classification (SPF/LF/SF/NF), use the error level (ERROR/WARN/OK) instead.

Select MRM

Create a DiagnosticStatus for the functionality for each MRM and set the diagnostic dependencies appropriately for that MRM to work. Then the system can easily find the currently available MRMs by checking at their error levels.

Also, create a status for normal functionality such as autonomous driving and remote driving in the same way. Then simply prioritize these functionality and select the one with the highest priority.

Error report

Create a root (it's not a tree, but a node that can reach all other nodes) DiagnosticsStatus for error reporting. It dynamically determines the level of error according to the user's interests and vehicle conditions. For example:

Ignore most errors during initialization.
Ignore errors of autonomous driving while stopped.
Notify errors of autonomous driving as warning while preparing for autonomous driving.

Message types

The current status of diagnostic units: /diagnostics_graph (DiagnosticArray.msg)
The static information of graph structure: /diagnostics_graph_struct (DiagnosticGraph.msg)

# DiagnosticGraph.msg
Time stamp
DiagnosticNode[] node

# DiagnosticNode.msg
uint32[] links

Backward compatibility

You can create messages compatible with /hazard_status by assigning error categories as follows:

	Report OK	Report WARN	Report ERROR
Status OK	NF	-	-
Status WARN	SF	LF	SPF
Status ERROR	SF	LF	SPF

Definition of done

create aggregator for DAG.
create new diagnostics config.

The text was updated successfully, but these errors were encountered:

isamu-takagi · 2023-06-05T07:11:26Z

It was not possible to support graph structures without changing the plugin interface for diagnostic_aggregator. It seems that AnalyzerGroup cannot share child Analyzer with others.

stale · 2023-08-18T12:30:04Z

This pull request has been automatically marked as stale because it has not had recent activity.

isamu-takagi · 2023-09-01T08:45:16Z

I'll change the Autoware diagnostics as follows.

Replace diagnostic_aggregator and system_error_monitor with system_diagnostic_graph node.
Repalce diagnostics_agg and hazard_status with diagnostics_graph and diagnostics_summary topic.
- diagnostics_summary is similar to emergency, emergency_holding fields in hazard_status
- diagnostics_graph is similar to nf, sf, lf, spf fields in hazard_status or diagnostics_agg
PR
- feat: system diagnostic monitor message tier4/tier4_autoware_msgs#96
- feat: system diagnostic monitor #4722

stale · 2023-10-31T22:26:53Z

This pull request has been automatically marked as stale because it has not had recent activity.

isamu-takagi · 2023-12-12T02:57:56Z

Add diagnostic graph aggregator

stale · 2024-02-10T06:07:03Z

This pull request has been automatically marked as stale because it has not had recent activity.

isamu-takagi · 2024-02-14T10:39:40Z

Package creation is complete. We will integrate with MRM in https://github.com/orgs/autowarefoundation/discussions/4176.

isamu-takagi self-assigned this May 26, 2023

isamu-takagi mentioned this issue Aug 16, 2023

feat: system diagnostic monitor message tier4/tier4_autoware_msgs#96

Merged

7 tasks

stale bot added the status:stale Inactive or outdated issues. (auto-assigned) label Aug 18, 2023

stale bot removed the status:stale Inactive or outdated issues. (auto-assigned) label Sep 1, 2023

asana17 mentioned this issue Sep 21, 2023

feat: system diagnostic monitor #4722

Merged

7 tasks

stale bot added the status:stale Inactive or outdated issues. (auto-assigned) label Oct 31, 2023

isamu-takagi mentioned this issue Dec 12, 2023

fix: add system msg autowarefoundation/autoware_msgs#79

Merged

4 tasks

stale bot removed the status:stale Inactive or outdated issues. (auto-assigned) label Dec 12, 2023

stale bot added the status:stale Inactive or outdated issues. (auto-assigned) label Feb 10, 2024

isamu-takagi closed this as completed Feb 14, 2024

isamu-takagi mentioned this issue Feb 20, 2024

feat(tier4_system_msgs): improve diagnostic graph efficiency tier4/tier4_autoware_msgs#113

Merged

7 tasks

isamu-takagi mentioned this issue Mar 30, 2024

feat: remake diagnostic graph packages #6715

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve diagnostics aggregation method #3829

Improve diagnostics aggregation method #3829

isamu-takagi commented May 26, 2023 •

edited

Loading

isamu-takagi commented Jun 5, 2023

stale bot commented Aug 18, 2023

isamu-takagi commented Sep 1, 2023

stale bot commented Oct 31, 2023

isamu-takagi commented Dec 12, 2023

stale bot commented Feb 10, 2024

isamu-takagi commented Feb 14, 2024

Improve diagnostics aggregation method #3829

Improve diagnostics aggregation method #3829

Comments

isamu-takagi commented May 26, 2023 • edited Loading

Checklist

Description

Purpose

Possible approaches

Error classification

Select MRM

Error report

Message types

Backward compatibility

Definition of done

isamu-takagi commented Jun 5, 2023

stale bot commented Aug 18, 2023

isamu-takagi commented Sep 1, 2023

stale bot commented Oct 31, 2023

isamu-takagi commented Dec 12, 2023

stale bot commented Feb 10, 2024

isamu-takagi commented Feb 14, 2024

isamu-takagi commented May 26, 2023 •

edited

Loading