Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling framework initial draft #391

Merged
merged 3 commits into from
Dec 17, 2019

Conversation

sivamukka
Copy link
Contributor

This document describes high level design details for Error Handling framework in SONiC.

Signed-off-by: Siva Mukka [email protected]

@msftclas
Copy link

msftclas commented May 23, 2019

CLA assistant check
All CLA requirements met.

The requirements for error handling framework are:

1.1.1 Provide registration/de-registration mechanism for applications to enable/disable error notifications on a specific table. More than one application can register for notifications on a given table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can framework supports applications to register for notifications at attribute level?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notifications can only be enabled per-table. Failure status code is reported per object. For example, port table has multiple objects like MTU, admin state. Notifications can be enabled at Port table level, but not on MTU failures specifically.

- Extensible to all types of errors in the system, not restricted to APP_DB definitions.
- Efficient, as notifications are limited to failures in the DB.
- Notification for delete failures can be supported even when corresponding objects are deleted from APP_DB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the new DB approach, however here is my thought, why can't we have a single DB (APP DB) and separated by namespaces? ex: configured vs applied/error in the same table so that it could be easy to maintain one table.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide more details here? Are you suggesting that ERROR tables can be stored in APP_DB, and there is no need to create ERROR_DB? We want to avoid modifying the existing ROUTE_TABLE schema in APP_DB - error handling can optionally be disabled and retain the current behavior.

- Translates it from SAI data types to ERROR_DB data types
- Adds an entry in to error database. If the entry already exists, the corresponding failure code is updated.
- Publishes the notifications to respective error listeners.
3. Error listener waits for the incoming notifications, filters them and invokes the application callback.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe how does Error listener filters the notifications ? what is the criteria supported? please add the use case?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we will add more details on this.

| Create failure | Delete failure | Remove the entry from database and notify the registered applications |
| Create failure | Update success | Remove the entry from the database and notify the registered applications |
| Create success | Delete failure | Add the entry to the database and notify the registered applications |
| Delete failure | Create success | Remove the entry from the database and notify the registered applications |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the applications get out of order notifications from feedback loop? How to handle in the case of it? Ex: User does create/delete/create and do you expect the error feedback come in order?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of notifications will be preserved - because changes to APP_DB and ASIC_DB maintain the sequence. In case the same object fails multiple times, we need a unique transaction id to associate the operation and failure. To address this, we are looking at adding unique ID to each APP_DB operation and reporting the ID back as part of failure notification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants