Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Orchagent] Add optional create_switch timeout parameter #3258

Merged
merged 4 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions orchagent/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,11 @@ uint32_t gCfgSystemPorts = 0;
string gMyHostName = "";
string gMyAsicName = "";
bool gTraditionalFlexCounter = false;
uint32_t create_switch_timeout = 0;

void usage()
{
cout << "usage: orchagent [-h] [-r record_type] [-d record_location] [-f swss_rec_filename] [-j sairedis_rec_filename] [-b batch_size] [-m MAC] [-i INST_ID] [-s] [-z mode] [-k bulk_size] [-q zmq_server_address] [-c mode]" << endl;
cout << "usage: orchagent [-h] [-r record_type] [-d record_location] [-f swss_rec_filename] [-j sairedis_rec_filename] [-b batch_size] [-m MAC] [-i INST_ID] [-s] [-z mode] [-k bulk_size] [-q zmq_server_address] [-c mode] [-t create_switch_timeout]" << endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to change the existing timeout? How will this be controlled from user perspective?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option provides the flexibility to override the existing timeout, it is optional because we most of platforms' performance are good enough to create switch within default timeout.

The timeout setting can be controlled by platform designer, e.g. https://github.com/sonic-net/sonic-buildimage/pull/19928/files

cout << " -h: display this message" << endl;
cout << " -r record_type: record orchagent logs with type (default 3)" << endl;
cout << " Bit 0: sairedis.rec, Bit 1: swss.rec, Bit 2: responsepublisher.rec. For example:" << endl;
Expand All @@ -92,6 +93,7 @@ void usage()
cout << " -k max bulk size in bulk mode (default 1000)" << endl;
cout << " -q zmq_server_address: ZMQ server address (default disable ZMQ)" << endl;
cout << " -c counter mode (traditional|asic_db), default: asic_db" << endl;
cout << " -t Override create switch timeout, in sec" << endl;
}

void sighup_handler(int signo)
Expand Down Expand Up @@ -346,7 +348,7 @@ int main(int argc, char **argv)
string responsepublisher_rec_filename = Recorder::RESPPUB_FNAME;
int record_type = 3; // Only swss and sairedis recordings enabled by default.

while ((opt = getopt(argc, argv, "b:m:r:f:j:d:i:hsz:k:q:c:")) != -1)
while ((opt = getopt(argc, argv, "b:m:r:f:j:d:i:hsz:k:q:c:t:")) != -1)
{
switch (opt)
{
Expand Down Expand Up @@ -437,6 +439,9 @@ int main(int argc, char **argv)
enable_zmq = true;
}
break;
case 't':
create_switch_timeout = atoi(optarg);
break;
default: /* '?' */
exit(EXIT_FAILURE);
}
Expand Down Expand Up @@ -629,15 +634,20 @@ int main(int argc, char **argv)
delay_factor = 2;
}

if (gMySwitchType == "voq" || gMySwitchType == "fabric" || gMySwitchType == "chassis-packet" || gMySwitchType == "dpu" || asan_enabled)
if (gMySwitchType == "voq" || gMySwitchType == "fabric" || gMySwitchType == "chassis-packet" || gMySwitchType == "dpu" || asan_enabled || create_switch_timeout)
{
/* We set this long timeout in order for orchagent to wait enough time for
* response from syncd. It is needed since switch create takes more time
* than default time to create switch if there are lots of front panel ports
* and systems ports to initialize
*/

if (gMySwitchType == "voq" || gMySwitchType == "chassis-packet" || gMySwitchType == "dpu")
if (create_switch_timeout)
{
/* Convert timeout to milliseconds from seconds */
attr.value.u64 = (create_switch_timeout * 1000);
}
else if (gMySwitchType == "voq" || gMySwitchType == "chassis-packet" || gMySwitchType == "dpu")
{
attr.value.u64 = (5 * SAI_REDIS_DEFAULT_SYNC_OPERATION_RESPONSE_TIMEOUT);
}
Expand Down Expand Up @@ -672,7 +682,7 @@ int main(int argc, char **argv)
}
SWSS_LOG_NOTICE("Create a switch, id:%" PRIu64, gSwitchId);

if (gMySwitchType == "voq" || gMySwitchType == "fabric" || gMySwitchType == "chassis-packet" || gMySwitchType == "dpu")
if (gMySwitchType == "voq" || gMySwitchType == "fabric" || gMySwitchType == "chassis-packet" || gMySwitchType == "dpu" || create_switch_timeout)
{
/* Set syncd response timeout back to the default value */
attr.id = SAI_REDIS_SWITCH_ATTR_SYNC_OPERATION_RESPONSE_TIMEOUT;
Expand Down
3 changes: 2 additions & 1 deletion tests/test_zmq.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,9 @@ class TestZmqDash(object):
@pytest.fixture(scope="class")
def enable_orchagent_zmq(self, dvs):
# change orchagent to use ZMQ
# change orchagent to use custom create_switch_timeout
dvs.runcmd("cp /usr/bin/orchagent.sh /usr/bin/orchagent.sh_zmq_ut_backup")
dvs.runcmd("sed -i.bak 's/\/usr\/bin\/orchagent /\/usr\/bin\/orchagent -q tcp:\/\/127.0.0.1:8100 /g' /usr/bin/orchagent.sh")
dvs.runcmd("sed -i.bak 's/\/usr\/bin\/orchagent /\/usr\/bin\/orchagent -q tcp:\/\/127.0.0.1:8100 -t 60 /g' /usr/bin/orchagent.sh")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this specific test is changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the change is in main() passing a new argument to orchagent, found we that we are doing a similar argument change in this test case. This change does not affect the behavior of the test as timeout value we are sending is still same as default(60 sec). But helps to cover the UT for new code change.

dvs.stop_swss()
dvs.start_swss()

Expand Down
Loading