Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of mdio IPC server class using sai switch api and unix so… #1080

Merged
merged 12 commits into from
Jul 28, 2022

Conversation

jiahua-wang
Copy link
Contributor

@jiahua-wang jiahua-wang commented Jul 16, 2022

…cket

Signed-off-by: Jiahua Wang [email protected]

Why I did it
When MDIO devices (external PHYs) are connected on MDIO bus from NPU, the MDIO access is through SAI switch mdio read/write APIs. The syncd calling the SAI APIs needs to act as an IPC server so that the gbsyncd programming the MDIO devices can use the APIs by the IPC mechanism.

How I did it
MdioIpcServer class is added to start a new thread, to create an unix socket, to listen on the socket, to accept connection and to read/reply IPC messages. The corresponding functions for MDIO clause 45 and clause 22 access are also added to VendorSai class.

How to verify it
We can use socat to simulate the IPC client, e.g.
docker exec -it syncd socat - UNIX-CONNECT:/var/run/sswsyncd/mdio-ipc.srv
to read MDIO clause 45 register at an address and an offset
mdio <address> <reg offset>
to write MDIO clause 45 register at an address and an offset with a value
mdio <address> <reg offset> <value>
to read MDIO clause 22 register at an address and an offset
mdio-cl22 <address> <reg offset>
to write MDIO clause 22 register at an address and an offset with a value
mdio-cl22 <address> <reg offset> <value>

@lguohan
Copy link
Contributor

lguohan commented Jul 18, 2022

@jiahua-wang , it is better to use the old pr for iteration.

Signed-off-by: Jiahua Wang <[email protected]>
Signed-off-by: Jiahua Wang <[email protected]>
Signed-off-by: Jiahua Wang <[email protected]>
@jiahua-wang
Copy link
Contributor Author

There are 5 words failing the spelling check. If ignore upper/lower case, there is actually 4 words: mdio, phy, gbsyncd and tokenize. How do we add the 4 words to the dictionary?

Copy link
Collaborator

@kcudnik kcudnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address comments

syncd/MdioIpcServer.h Outdated Show resolved Hide resolved
syncd/MdioIpcServer.h Outdated Show resolved Hide resolved

static sai_switch_api_t *switch_mdio_api;

static void *syncd_ipc_task_enter(void*);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be std::function? if not please add tydef for function pointer and use that typedef in declarations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syncd_ipc_task_enter is the function used by pthread_create(). It happens to be this prototype. It is declared as a function with this prototype.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make typdef for this

Comment on lines 46 to 49
#ifdef MDIO_ACCESS_USE_NPU
MdioIpcServer::setSwitchId(m_switch_rid);
#endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you made this a single static class, what if there would be multiple switches? is this code able to handle that ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multiple switches in gbsyncd, the code is fine as it will return without doing anything. For syncd, we don't have MDIO buses mapped to multiple NPU. If we are going to support MDIO buses mapped to more than single NPU, the entire MDIO IPC server code has to be changed. We are not addressing the issue this time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest to move this out of the Switch.cpp and uset this in Syncd.cpp, since eveery new switch this setSwitchId will be set and you dont want that since you only want one switch, also you still call static api, and i propopsed to make this proper c++ object

syncd/VendorSai.cpp Outdated Show resolved Hide resolved
syncd/VendorSai.cpp Outdated Show resolved Hide resolved
syncd/VendorSai.cpp Outdated Show resolved Hide resolved
meta/SaiInterface.cpp Outdated Show resolved Hide resolved
meta/SaiInterface.h Outdated Show resolved Hide resolved
syncd/MdioIpcServer.h Outdated Show resolved Hide resolved

static sai_switch_api_t *switch_mdio_api;

static void *syncd_ipc_task_enter(void*);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make typdef for this

syncd/MdioIpcServer.h Outdated Show resolved Hide resolved
syncd/MdioIpcServer.h Outdated Show resolved Hide resolved
syncd/MdioIpcServer.h Outdated Show resolved Hide resolved
Comment on lines 46 to 49
#ifdef MDIO_ACCESS_USE_NPU
MdioIpcServer::setSwitchId(m_switch_rid);
#endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest to move this out of the Switch.cpp and uset this in Syncd.cpp, since eveery new switch this setSwitchId will be set and you dont want that since you only want one switch, also you still call static api, and i propopsed to make this proper c++ object

syncd/Syncd.cpp Outdated Show resolved Hide resolved
syncd/MdioIpcServer.cpp Outdated Show resolved Hide resolved
kcudnik
kcudnik previously approved these changes Jul 23, 2022
@kcudnik
Copy link
Collaborator

kcudnik commented Jul 23, 2022

please resolve build failures

@jiahua-wang
Copy link
Contributor Author

jiahua-wang commented Jul 24, 2022

please resolve build failures

The build failure is due to spelling check failure. How do we add new words to the dictionary?

kcudnik
kcudnik previously approved these changes Jul 24, 2022
@kcudnik
Copy link
Collaborator

kcudnik commented Jul 24, 2022

please address build issues

Signed-off-by: Jiahua Wang <[email protected]>
@lguohan lguohan merged commit 854d54e into sonic-net:master Jul 28, 2022
dprital added a commit to dprital/sonic-buildimage that referenced this pull request Aug 9, 2022
Update sonic-sairedis submodule pointer to include the following:
* [sonic-sairedis] Support bulk counter ([sonic-net#1094](sonic-net/sonic-sairedis#1094))
* Transfer organization from Azure to sonic-net ([sonic-net#1095](sonic-net/sonic-sairedis#1095))
* [BFN] Provide unified approach to select P4 profile based on chip family ([sonic-net#1089](sonic-net/sonic-sairedis#1089))
* Add support of mdio IPC server class using sai switch api and unix socket ([sonic-net#1080](sonic-net/sonic-sairedis#1080))
* [FlexCounter] Refactor FlexCounter class  ([sonic-net#1073](sonic-net/sonic-sairedis#1073))

Signed-off-by: dprital <[email protected]>
yxieca pushed a commit that referenced this pull request Aug 26, 2022
…cket (#1080)

Why I did it
When MDIO devices (external PHYs) are connected on MDIO bus from NPU, the MDIO access is through SAI switch mdio read/write APIs. The syncd calling the SAI APIs needs to act as an IPC server so that the gbsyncd programming the MDIO devices can use the APIs by the IPC mechanism.

How I did it
MdioIpcServer class is added to start a new thread, to create an unix socket, to listen on the socket, to accept connection and to read/reply IPC messages. The corresponding functions for MDIO clause 45 and clause 22 access are also added to VendorSai class.

How to verify it
We can use socat to simulate the IPC client, e.g.
docker exec -it syncd socat - UNIX-CONNECT:/var/run/sswsyncd/mdio-ipc.srv
to read MDIO clause 45 register at an address and an offset
mdio <address> <reg offset>
to write MDIO clause 45 register at an address and an offset with a value
mdio <address> <reg offset> <value>
to read MDIO clause 22 register at an address and an offset
mdio-cl22 <address> <reg offset>
to write MDIO clause 22 register at an address and an offset with a value
mdio-cl22 <address> <reg offset> <value>

Signed-off-by: Jiahua Wang <[email protected]>
jimmyzhai added a commit to sonic-net/sonic-buildimage that referenced this pull request Aug 28, 2022
2022-07-28 854d54e: Add support of mdio IPC server class using sai switch api and unix socket (sonic-net/sonic-sairedis#1080) (Jiahua Wang)
2022-07-27 513cb2a: [FlexCounter] Refactor FlexCounter class (sonic-net/sonic-sairedis#1073) (Junchao-Mellanox)
zbud-msft pushed a commit to zbud-msft/sonic-buildimage that referenced this pull request Aug 31, 2022
2022-07-28 854d54e: Add support of mdio IPC server class using sai switch api and unix socket (sonic-net/sonic-sairedis#1080) (Jiahua Wang)
2022-07-27 513cb2a: [FlexCounter] Refactor FlexCounter class (sonic-net/sonic-sairedis#1073) (Junchao-Mellanox)
zbud-msft added a commit to renukamanavalan/sonic-buildimage that referenced this pull request Sep 12, 2022
* Add YANG models for structured events

* [Arista] Update platform submodule (sonic-net#11853)

* Advance submodule sonic-sairedis (sonic-net#11704)

2022-07-28 854d54e: Add support of mdio IPC server class using sai switch api and unix socket (sonic-net/sonic-sairedis#1080) (Jiahua Wang)
2022-07-27 513cb2a: [FlexCounter] Refactor FlexCounter class (sonic-net/sonic-sairedis#1073) (Junchao-Mellanox)

* Update swss common submodule for events api (sonic-net#11858)

#### Why I did it

Structured events code like eventd, rsyslogplugin, requires changes made in swss-common

Submodule adds these newest commits:

56b0f18 (HEAD, origin/master, origin/HEAD, master) Events: APIs to set/get global options (sonic-net#672)
5467c89 Add changes to yml file to improve pytest (sonic-net#674)

#### How I did it

Updated git submodule

#### How to verify it

Check new commit pointer

* [Arista] Fix content of platform.json for DCS-720DT-48S (sonic-net#11855)

Why I did it
Content of platform.json was outdated and some platform_tests/api of sonic-mgmt were failing.

How I did it
Added the necessary values to platform.json

How to verify it
Running platform_tests/api of sonic-mgmt should yield 100% passrate.

* [actions] Update github actions label and automerge. (sonic-net#11736)

1. Add auto approve step when adding label to version upgrading PR.
2. Use mssonicbld TOKEN to merge version upgrading PR instead of 'github actions'

* [ci] Update reproducible build related pipeline. (sonic-net#11810)

* Address Review Comment to define SONIC_GLOBAL_DB_CLI in gbsyncd.sh (sonic-net#11857)

As part of PR sonic-net#11754

    Change was added to use variable SONIC_DB_NS_CLI for
    namespace but that will not work since ./files/scripts/syncd_common.sh
    uses SONIC_DB_CLI. So revert back to use SONIC_DB_CLI and define new
    variable for SONIC_GLOBAL_DB_CLI for global/host db cli access

   Also fixed DB_CLI not working for namespace.

* [Build] Increase the size of the installer image (sonic-net#11869)

#### Why I did it
Fix the build failure caused by the installer image size too small. The installer image is only used during the build, not impact the final images.
See https://dev.azure.com/mssonic/build/_build/results?buildId=139926&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=359769c4-8b5e-5976-a793-85da132e0a6f

```
+ fallocate -l 2048M ./sonic-installer.img
+ mkfs.vfat ./sonic-installer.img
mkfs.fat 4.2 (2021-01-31)
++ mktemp -d
+ tmpdir=/tmp/tmp.TqdDSc00Cn
+ mount -o loop ./sonic-installer.img /tmp/tmp.TqdDSc00Cn
+ cp target/sonic-vs.bin /tmp/tmp.TqdDSc00Cn/onie-installer.bin
cp: error writing '/tmp/tmp.TqdDSc00Cn/onie-installer.bin': No space left on device
[  FAIL LOG END  ] [ target/sonic-vs.img.gz ]
```

#### How I did it
Increase the size from 2048M to 4096M.
Why not increase to 16G like qcow2 image?
The qcow2 supports the sparse disk, although a big disk size allocated, but it will not consume the real disk size. The falocate does not support the sparse disk. We do not want to allocate a very big disk, but no use at all. It will require more space to build.

* Update sensor names for msn4600c for the 5.10 kernel (sonic-net#11491)

* Update sensor names for msn4600c for the 5.10 kernel

Looks like a sensor was removed in the 5.10 kernel for the tps53679
sensor, so the names/indexing has changed.

Related to sonic-net/sonic-mgmt#4513.

Signed-off-by: Saikrishna Arcot <[email protected]>

* Update sensors file

Signed-off-by: Saikrishna Arcot <[email protected]>

Signed-off-by: Saikrishna Arcot <[email protected]>

* Fix error handling when failing to install a deb package (sonic-net#11846)

The current error handling code for when a deb package fails to be
installed currently has a chain of commands linked together by && and
ends with `exit 1`. The assumption is that the commands would succeed,
and the last `exit 1` would end it with a non-zero return code, thus
fully failing the target and causing the build to stop because of bash's
-e flag.

However, if one of the commands prior to `exit 1` returns a non-zero
return code, then bash won't actually treat it as a terminating error.
From bash's man page:

```
-e      Exit immediately if a pipeline (which may consist of a single simple
	command), a list, or a compound command (see SHELL GRAMMAR above),
        exits with a non-zero status.  The shell does not exit if the
        command that fails is part of the  command  list  immediately
        following a while or until keyword, part of the test following the
        if or elif reserved words, part of any command executed in a && or
        || list except the command following the final && or ||, any
        command in a pipeline but the last, or if the command's return
        value is being inverted with !.  If a compound command other than a
        subshell returns a non-zero status because a command failed while
        -e was being ignored, the shell does not exit.
```

The part `part of any command executed in a && or || list except the
command following the final && or ||` says that if the failing command
is not the `exit 1` that we have at the end, then bash doesn't treat it
as an error and exit immediately. Additionally, since this is a compound
command, but isn't in a subshell (subshell are marked by `(` and `)`,
whereas `{` and `}` just tells bash to run the commands in the current
environment), bash doesn't exist. The result of this is that in the
deb-install target, if a package installation fails, it may be
infinitely stuck in that while-loop.

There are two fixes for this: change to using a subshell, or use `;`
instead of `&&`. Using a subshell would, I think, require exporting any
shell variables used in the subshell, so I chose to change the `&&` to
`;`. In addition, at the start of the subshell, `set +e` is added in,
which removes the exit-on-error handling of bash. This makes sure that
all commands are run (the output of which may help for debugging) and
that it still exits with 1, which will then fully fail the target.

Signed-off-by: Saikrishna Arcot <[email protected]>

Signed-off-by: Saikrishna Arcot <[email protected]>

* Fix vs check install login timeout issue (sonic-net#11727)

Why I did it
Fix a build not stable issue: sonic-net#11620
The vs vm has started successfully, but failed to wait for the message "sonic login:".

There were 55 builds failed caused by the issue in the last 30 days.

AzurePipelineBuildLogs
| where startTime > ago(30d)
| where type =~ "task"
| where result =~ "failed"
| where name =~ "Build sonic image"
| where content contains "Timeout exceeded"
| where content contains "re.compile('sonic login:')"
| project-away content
| extend branchName=case(reason=~"pullRequest", tostring(todynamic(parameters)['system.pullRequest.targetBranch']),
              replace("refs/heads/", "", sourceBranch))
| summarize FailedCount=dcount(buildId) by branchName

branchName	FailedCount
master	37
202012	9
202106	4
202111	2
202205	1
201911	1
It is caused by the login message mixed with the output message of the /etc/rc.local, one of the examples as below: (see the message rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: )
The check_install.py was waiting for the message "sonic login:", and Linux console was waiting for the username input (the login message has already printed in the console).
https://dev.azure.com/mssonic/build/_build/results?buildId=123294&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=359769c4-8b5e-5976-a793-85da132e0a6f

2022-07-17T15:00:58.9198877Z [   25.493855] rc.local[307]: + onie_disco_opt53=05
2022-07-17T15:00:58.9199330Z [   25.595054] rc.local[307]: + onie_disco_router=10.0.2.2
2022-07-17T15:00:58.9199781Z [   25.699409] rc.local[307]: + onie_disco_serverid=10.0.2.2
2022-07-17T15:00:58.9200252Z [   25.789891] rc.local[307]: + onie_disco_siaddr=10.0.2.2
2022-07-17T15:00:58.9200622Z [   25.880920] 
2022-07-17T15:00:58.9200745Z 
2022-07-17T15:00:58.9201019Z Debian GNU/Linux 10 sonic ttyS0
2022-07-17T15:00:58.9201201Z 
2022-07-17T15:00:58.9201542Z rc.local[307]: sonic+ onie_disco_subnet=255.255.255.0 login: 
2022-07-17T15:00:58.9202309Z [   26.079767] rc.local[307]: + onie_exec_url=file://dev/vdb/onie-installer.bin

How I did it
Input a newline when finished to run the script /etc/rc.local.
If entering a newline, the message "sonic login:" will prompt again.

* [ci] Fix bug involved by PR 11810 which affect official build pipeline (sonic-net#11891)

Why I did it
Fix the official build not triggered correctly issue, caused by the azp template path not existing.

How I did it
Change the azp template path.

* DellEMC: Z9332f - Graceful platform reboot (sonic-net#10240)

Why I did it
To gracefully unmount filesystems and stop containers while performing a cold reboot.
Unmount ONIE-BOOT if mounted during fast/soft/warm reboot
How I did it
Override systemd-reboot service to perform a cold reboot.
Unmount ONIE-BOOT if mounted using fast/soft/warm-reboot plugins.
How to verify it
On reboot, verify that the container stop and filesystem unmount services have completed execution before the platform reboot.

* [Nokia][Nokia-IXR7250E-36x100G & Nokia-IXR7250E-36x400G] Update BCM (sonic-net#11577)

config to support ERSPAN egress mirror and also set flag to preserve ECN

* Align API get_device_runtime_metadata() for python version < 3.9 (sonic-net#11900)

Why I did it:
API get_device_runtime_metadata() added by sonic-net#11795 uses merge operator for dict but that is supported only for python version >=3.9. This API will be be used by scrips eg:hostcfgd which is still build for buster which does not have python 3.9 support.

* [Arista7050cx3] TD3 SKU changes for pg headroom value after interop testing with cisco 8102 (sonic-net#11901)

Why I did it
After PFC interop testing between 8102 and 7050cx3, data packet losses were observed on the Rx ports of the 7050cx3 (inflow from 8102) during testing. This was primarily due to the slower response times to react to PFC pause packets for the 8102, when receiving such frames from neighboring devices. To solve for the packet drops, the 7050cx3 pg headroom size has to be increased to 160kB.

How I did it
Modified the xoff threshold value to 160kB in the pg_profile file to allow for the buffer manager to read that value when building the image, and configuring the device

How to verify it
run "mmuconfig -l" once image is built


Signed-off-by: dojha <[email protected]>

* Add peer review comments on bgp

* Add peer review changes + spacing

* Add changes to events-swss

* Add peer review changes in pmon swss

* Add review changes dhcp-relay

* Add peer review changes to host

* Add changes to severity, leafref

* Remove unused grouping

* Remove redis generic

Signed-off-by: Saikrishna Arcot <[email protected]>
Signed-off-by: dojha <[email protected]>
Co-authored-by: Samuel Angebault <[email protected]>
Co-authored-by: Junhua Zhai <[email protected]>
Co-authored-by: Liu Shilong <[email protected]>
Co-authored-by: abdosi <[email protected]>
Co-authored-by: xumia <[email protected]>
Co-authored-by: Saikrishna Arcot <[email protected]>
Co-authored-by: Arun Saravanan Balachandran <[email protected]>
Co-authored-by: saksarav-nokia <[email protected]>
Co-authored-by: Dev Ojha <[email protected]>
pettershao-ragilenetworks pushed a commit to pettershao-ragilenetworks/sonic-sairedis that referenced this pull request Nov 18, 2022
…cket (sonic-net#1080)

Why I did it
When MDIO devices (external PHYs) are connected on MDIO bus from NPU, the MDIO access is through SAI switch mdio read/write APIs. The syncd calling the SAI APIs needs to act as an IPC server so that the gbsyncd programming the MDIO devices can use the APIs by the IPC mechanism.

How I did it
MdioIpcServer class is added to start a new thread, to create an unix socket, to listen on the socket, to accept connection and to read/reply IPC messages. The corresponding functions for MDIO clause 45 and clause 22 access are also added to VendorSai class.

How to verify it
We can use socat to simulate the IPC client, e.g.
docker exec -it syncd socat - UNIX-CONNECT:/var/run/sswsyncd/mdio-ipc.srv
to read MDIO clause 45 register at an address and an offset
mdio <address> <reg offset>
to write MDIO clause 45 register at an address and an offset with a value
mdio <address> <reg offset> <value>
to read MDIO clause 22 register at an address and an offset
mdio-cl22 <address> <reg offset>
to write MDIO clause 22 register at an address and an offset with a value
mdio-cl22 <address> <reg offset> <value>

Signed-off-by: Jiahua Wang <[email protected]>
skbarista pushed a commit to skbarista/sonic-sairedis that referenced this pull request Dec 2, 2022
…cket (sonic-net#1080)

Why I did it
When MDIO devices (external PHYs) are connected on MDIO bus from NPU, the MDIO access is through SAI switch mdio read/write APIs. The syncd calling the SAI APIs needs to act as an IPC server so that the gbsyncd programming the MDIO devices can use the APIs by the IPC mechanism.

How I did it
MdioIpcServer class is added to start a new thread, to create an unix socket, to listen on the socket, to accept connection and to read/reply IPC messages. The corresponding functions for MDIO clause 45 and clause 22 access are also added to VendorSai class.

How to verify it
We can use socat to simulate the IPC client, e.g.
docker exec -it syncd socat - UNIX-CONNECT:/var/run/sswsyncd/mdio-ipc.srv
to read MDIO clause 45 register at an address and an offset
mdio <address> <reg offset>
to write MDIO clause 45 register at an address and an offset with a value
mdio <address> <reg offset> <value>
to read MDIO clause 22 register at an address and an offset
mdio-cl22 <address> <reg offset>
to write MDIO clause 22 register at an address and an offset with a value
mdio-cl22 <address> <reg offset> <value>

Signed-off-by: Jiahua Wang <[email protected]>
@jiahua-wang jiahua-wang deleted the mdio-class branch May 12, 2023 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants