Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Profile Init HLD #1084

Merged
merged 7 commits into from
Dec 13, 2022

Conversation

noaOrMlnx
Copy link
Contributor

@noaOrMlnx noaOrMlnx commented Sep 19, 2022

Add Port Profile Init feature HLD

The purpose of this HLD is to use BULK approach for port creation and reduce uptime after reboot.

PR title state context
[ppi]: Implement port bulk comparison logic GitHub issue/pull request detail GitHub pull request check context
[ppi]: Enable bulk API GitHub issue/pull request detail GitHub pull request check context


2 checks will be performed in initialization flow.
1. Is bulk option supported? It will be checked using the following flow:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the explicit switch level capability check for port bulk support instead implicitly assuming with create_ports API? May be we can refer SAI_SWITCH_ATTR_SUPPORTED_FAILOVER_MODE

https://github.com/opencomputeproject/SAI/blob/master/inc/saiswitch.h#L2557

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the design after changes.
We allow vendor to implement create_ports() API while not implementing other APIs like remove_ports and set_ports_attribute.
for this purpose, we will have to check for each API if it is implemented.

Furthermore, if we will use the suggested approach, it means we will need to wait for PortConfigDone before configuring any port, even if bulk is not supported. and if it is not supported, syncd will translate the bulk command to the regular one. this can delay the process we have today and will not gain the purpose we wanted.

```
sai_bulk_object_create_fn create_ports;
sai_bulk_object_remove_fn remove_ports;
sai_bulk_object_set_attribute_fn set_ports_attribute;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see if we can use bulk set_ports_attribute instead of individual set attribute API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


* SONiC will support both bulk approach and legacy approach.
* Before creating the ports, SONiC will check if bulk operation is supported by SAI. If not, it will fallback to legacy approach.
* Ports creation bulk approach in SONiC expectes SAI to not create the port in create_switch() phase.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we share the performance data for ports create as part of create_switch (SAI profile file) vs bulk ports create from portsorch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do once we will have it.
But please note that performance is depend on vendor's implementation so can change between vendors.



<p align=left>
<img src="img/new_flow.svg" alt="Figure 2. New Flow">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new flow, instead of keeping the port information in the queue until we receive PORT_CONFIG_DONE, can we keep the port information in the local cache and process them upon getting PORT_CONFIG_DONE in portsorch? this helps to avoid processing the pending messages every second from queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


* SONiC will support both bulk approach and legacy approach.
* Before creating the ports, SONiC will check if bulk operation is supported by SAI. If not, it will fallback to legacy approach.
* Ports creation bulk approach in SONiC expectes SAI to not create the port in create_switch() phase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement must be removed from the HLD and we should support both legacy static mode (via sai.profile)and bulk api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Ports creation bulk approach in SONiC expectes SAI to not create the port in create_switch() phase.
* Before creating the ports, even if bulk is suported, SONiC will check if SAI created ports as before. If so, it will fallback to legacy approach.
* SONiC will send the bulk request only when all ports are configured in APP DB, PORT table. Meaning, only when "PortConfigDone" flag is raised.
* In case create_ports() API is called with an empty list of ports, SONiC expects SAI to return a status different from SAI_STATUS_NOT_IMPLEMENTED or SAI_STATUS_NOT_SUPPORTED. This call will be made in order to check if bulk approach is supported and implemented in vendor's SAI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the bulk API handles or propagate errors back to caller/ application in case of errors while do bulk ports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before. added to HLD


A new test will be added to DVS tests.

The test will push ports configuration to Config DB and expect ASIC_STATE:SAI_OBJECT_TYPE_PORT* keys count to be as the number of the ports in Config DB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add error cases for bulk api create or delete failures?


Notes:

* Multi-Asic is covered for these changes. In Multi-Asic environment, there are multiple instances of swss and Redis DB and each DB has it's own PORT table. Hence, port configuration is per Asic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you list what are the use cases need bulk API support? what is wrong with existing SAI create ports? Is that DPB needs bulk API support ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bulk API is needed in order to save time calling the regular APIs one by one.
It doesn't mean there is something wrong with the current approach, just that there is a faster way which will help the initialization to be faster and the plan is to save time for fast-boot so the uptime will not cross the 30 seconds limitation.

@noaOrMlnx
Copy link
Contributor Author

@venkatmahalingam
@madhupalu
all comments were addressed.
can you please review again?

@liat-grozovik
Copy link
Collaborator

@venkatmahalingam and @madhupalu kindly reminder. if you have no additional feedback i will merge this HLD by November 12. The code PRs should be available in the next 2 weeks

@zhangyanzhao
Copy link
Collaborator

@noaOrMlnx can you please add the code PRs into this HLD by referring to #806

@liat-grozovik
Copy link
Collaborator

@noaOrMlnx can you please add the code PRs into this HLD by referring to #806

The Code PRs will be shared by @nazariig in the next 1-2 weeks after tests are fully done.

@liat-grozovik
Copy link
Collaborator

@zhangyanzhao the doc is updated based on comments for more than a month with no feedback.
I suggest to merge it.

liat-grozovik pushed a commit to sonic-net/sonic-swss that referenced this pull request Aug 21, 2023
HLD: sonic-net/SONiC#1084

- What I did
Relaxed port attributes validation: decreased message severity

- Why I did it
To resolve LogAnalyzer failures during DPB test

- How I verified it
Run DPB test

Signed-off-by: Nazarii Hnydyn <[email protected]>
StormLiangMS pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Aug 30, 2023
DEPENDS:

[202211][ppi]: Implement port bulk comparison logic (#2564)  sonic-swss#2821
HLD: sonic-net/SONiC#1084

Why I did it
Enabled port late create on SN5600 switch boots up with no ports
Work item tracking
N/A
How I did it
Updated SAI xml config file
How to verify it
Run sonic-mgmt tests fastboot
StormLiangMS pushed a commit to sonic-net/sonic-swss that referenced this pull request Sep 25, 2023
DEPENDS:

[202211][ppi]: Implement port bulk comparison logic (#2564)  #2821
HLD: sonic-net/SONiC#1084

What I did

Removed unused code
Why I did it

To complete code cleanup
How I verified it

UT tests
VS tests
Details if related

Backport from master: [ppi]: General code cleanup: remove unused methods #2867
lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Nov 2, 2023
HLD: sonic-net/SONiC#1084

To improve FAST reboot dataplane downtime

Signed-off-by: Nazarii Hnydyn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants