baby hero test scale #586

mgheorghe · 2024-06-04T23:45:42Z

Test SAI Baby Hero.
Baby Hero test scale is 1% of the Hero test scale.
This is achieved by having only one prefix per ACL instead of 100 prefixes per ACL.

had to increase the BMv2 scale numbers to 8M and 4M from default 1K for mapping and routing tables

scale:

SAI_OBJECT_TYPE_VIP_ENTRY = 1
SAI_OBJECT_TYPE_DIRECTION_LOOKUP_ENTRY = 32
SAI_OBJECT_TYPE_VNET = 32
SAI_OBJECT_TYPE_DASH_ACL_GROUP = 320
SAI_OBJECT_TYPE_ENI = 32
SAI_OBJECT_TYPE_ENI_ETHER_ADDRESS_MAP_ENTRY = 32
SAI_OBJECT_TYPE_OUTBOUND_CA_TO_PA_ENTRY = 80.000 + 320
SAI_OBJECT_TYPE_OUTBOUND_ROUTING_ENTRY = 80.000 (SAI_OUTBOUND_ROUTING_ENTRY_ACTION_ROUTE_VNET) + 80.000 (SAI_OUTBOUND_ROUTING_ENTRY_ACTION_ROUTE_VNET_DIRECT)
SAI_OBJECT_TYPE_PA_VALIDATION_ENTRY = 32

run time:

on my setup
- configure BMv2: 15 min
- run traffic: 3 min
- clear config: 6 min
github action runtime 10 min for whole test

TO DO:

add back ACLs once BMv2 issue solved Change Optional to Ternary in dash_acl.p4 #575 Support for list-match type (P4) for ACLs, pending BMV2 fork #91
add traffic for SAI_OUTBOUND_ROUTING_ENTRY_ACTION_ROUTE_VNET_DIRECT ips
add traffic for all SAI_OBJECT_TYPE_DASH_ACL_GROUP (currently only first one for each ENI is exercised) - need to limit the pps to match BMv2 capability and also balance the test duration
use dpugen and remove the duplicate code failed to build Dockerfile.saichallenger-client-bldr #584

* path fix we need to go in dash folder first * baby hero * Update test_sai_baby_hero_traffic.py * Update test_sai_baby_hero.py * dpugen update to 2.2.0 * removed dev test file * revert file so container update does not trigger * remove some logging

chrispsommers · 2024-06-04T23:59:55Z

Is this supposed to run in CI? I see the runtime in CI is just a few minutes.
Is there any penalty to scaling up bmv2 tables (ignoring a new test)?

mgheorghe · 2024-06-05T03:21:16Z

Is this supposed to run in CI? I see the runtime in CI is just a few minutes. Is there any penalty to scaling up bmv2 tables (ignoring a new test)?

here i see no test was run
run sai-chalenger test 0 sec

right before submit in my branch https://github.com/mgheorghe/DASH/pull/6/checks
it took 10 min for all the tests

r12f · 2024-06-05T03:52:24Z

hi Mircea, these tests are huge, so are they generated? I am very worried about how to maintain them when things are changed.

r12f · 2024-06-05T14:44:12Z

Hi Mircea, I really appreciate the effort of adding the tests. However, I think a more proper place of adding this test is actually in sonic-mgmt repo.

There are a few reasons:

The CI should more focus on the correctness not speed or performance.
It is hard to assert regression happened with perf test in CI. These tests usually take long time and very sensitive to many factors, e.g., are the build machine busy doing something else. The build VMs that azure devop or github action uses are running on a shared machine, so the data will not be stable.
Moving forward, we are building DPU image and SmartSwitch virtual DUTs. These images and topologies will be used for running the sonic-mgmt tests. These tests can be scheduled as nightly runs, and we have dedicated resources for running them. It will be a much better place than the CI build machine.
Another advantage of adding this into sonic-mgmt repo is that, these tests will be naturally useable for the real devices, not just the BMv2 or any virtual/software implementation.

To summarize, I would recommend:

Moving this test to sonic-mgmt repo, where we already have a list of DASH tests here: https://github.com/sonic-net/sonic-mgmt/tree/master/tests/dash. It will help us keep the engineering efficiency, producing data with better quality, naturally support any implementations including the real hardware and also align with the direction we are going towards.
If you don't mind the effort, I would even recommend adding the HERO tests there as well! We have this discussed internally as well, and we all think this will be an awesome thing to have!

r12f

Requesting changes. Please see my comment above.

mgheorghe · 2024-06-05T19:05:43Z

The CI should more focus on the correctness not speed or performance.

It is hard to assert regression happened with perf test in CI.

the test is focused on scale not performance, i have it set to 1 pps, goal was to exercise 32ENI's each with 10NSG's each with 1000 ACLs..... scale should be just a function of memory available.

Moving this test to sonic-mgmt repo, where we already have a list of DASH tests here: https://github.com/sonic-net/sonic-mgmt/tree/master/tests/dash

i'll contribute them there too but, the sonic-mgmt repo will get the dash config version while BMv2 gets the SAI version of the config.

the dash repo test has as objective establishing a behavior baseline i can point each vendor to.
the sonic-mgmt version will be focused on showing sonic proper functionality
i see some value in covering also SAI since DASH config is not covering 100% what SAI can do see direction decision using eni mac/ip not the vni alone #351 for example
any option to have sai-chaleneger support in sonic-mgmt ?
is the smart switch virtual DUT available ? asking because i see https://sonic-build.azurewebsites.net/api/sonic/artifacts?branchName=master&platform=vs&target=target/sonic-vs.img.gz but also the statement The DPU SONiC KVM image with dataplane will be released at the next stage what is the difference between the 2 ?

i see the image is also based on BMv2 so all the issues i encountered with BMv2 will be there too + some more from all the sonic layers. https://github.com/sonic-net/SONiC/blob/master/images/dash/bmv2-virtual-sonic.svg

my conclusion is that i would like to add tests in both repos, i'll keep this one here and add a dash one in sonic-mgmt (it will require more work since i would like to make a proposal for an OTG virtual testbed for sonic-mgmt, maybe even adding TCP traffic)

mgheorghe · 2024-06-05T19:07:02Z

hi Mircea, these tests are huge, so are they generated? I am very worried about how to maintain them when things are changed.

generated and that 100M config file was supposed to be generated on the fly in memory at runtime, but i ran into an issue with #584 so i had to dump the config here till that issue is resolved.

KrisNey-MSFT · 2024-07-10T16:09:23Z

Issue is open in SAI Challenger, version of Debian is too old, container fails to build, etc... but we need a fix upstream

baby hero test scale (#6)

d80ee61

* path fix we need to go in dash folder first * baby hero * Update test_sai_baby_hero_traffic.py * Update test_sai_baby_hero.py * dpugen update to 2.2.0 * removed dev test file * revert file so container update does not trigger * remove some logging

r12f self-requested a review June 5, 2024 14:44

r12f requested changes Jun 5, 2024

View reviewed changes

Merge branch 'sonic-net:main' into pr-baby-hero

a9f3c4a

mgheorghe marked this pull request as draft June 12, 2024 16:14

mgheorghe added 8 commits August 6, 2024 10:31

make same with main

4630263

Merge branch 'sonic-net:main' into pr-baby-hero

a8f7ca0

update dpugen version

bfaae39

update saic to latest

7159c86

generate file on the fly

91e3fd8

enable back sai cases

843764b

import fix

03469cd

Update SAI

4d6aa5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baby hero test scale #586

baby hero test scale #586

mgheorghe commented Jun 4, 2024 •

edited

Loading

chrispsommers commented Jun 4, 2024

mgheorghe commented Jun 5, 2024

r12f commented Jun 5, 2024

r12f commented Jun 5, 2024 •

edited

Loading

r12f left a comment

mgheorghe commented Jun 5, 2024

mgheorghe commented Jun 5, 2024

KrisNey-MSFT commented Jul 10, 2024

baby hero test scale #586

Are you sure you want to change the base?

baby hero test scale #586

Conversation

mgheorghe commented Jun 4, 2024 • edited Loading

scale:

run time:

TO DO:

chrispsommers commented Jun 4, 2024

mgheorghe commented Jun 5, 2024

r12f commented Jun 5, 2024

r12f commented Jun 5, 2024 • edited Loading

r12f left a comment

Choose a reason for hiding this comment

mgheorghe commented Jun 5, 2024

mgheorghe commented Jun 5, 2024

KrisNey-MSFT commented Jul 10, 2024

mgheorghe commented Jun 4, 2024 •

edited

Loading

r12f commented Jun 5, 2024 •

edited

Loading