Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLI configuration options for teamd retry count feature #2642

Merged
merged 28 commits into from
Jun 2, 2023
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
42b20c1
Add CLI configuration options for teamd retry count feature
saiarcot895 Feb 1, 2023
b42b635
Add test for error case from teamd when it's not running
saiarcot895 Feb 1, 2023
e9e9af0
Fix up test cases
saiarcot895 Feb 1, 2023
f834b8a
Add some error handling if teamdctl doesn't exist
saiarcot895 Feb 7, 2023
bd40c1b
Merge commit 'd433b2f954e446db7a655e882a7274cd5bce3a50' into teamd-re…
saiarcot895 Apr 20, 2023
7fc5ebd
Add probe functionality and sending current LACPDU packet functionality
saiarcot895 Apr 26, 2023
b5b372b
Check to see if the retry count feature is enabled before doing a get…
saiarcot895 May 4, 2023
c3c6b2e
Add option to only send probe packets or only change retry count
saiarcot895 May 4, 2023
ad54c4c
Call the teamd retry count script if doing a warm-reboot
saiarcot895 May 4, 2023
fc9195f
Fix pycheck errors, and disable scapy's IPv6 and verbose mode
saiarcot895 May 15, 2023
44a6712
Make teamd retry count support optional
saiarcot895 May 15, 2023
5aa89b5
Address review comments, and restructure code to increase code coverage
saiarcot895 May 17, 2023
1a4e17f
Address some review comments
saiarcot895 May 17, 2023
bbad1e3
Replace tabs with spaces
saiarcot895 May 17, 2023
5acd304
Verify that expected keys are present in the data returned from teamdctl
saiarcot895 May 17, 2023
0f4f822
Merge commit '7d2ca0b' into teamd-retry-count-cli
saiarcot895 May 18, 2023
e6acfe0
Fix TimeoutExpired undefined error
saiarcot895 May 18, 2023
9e2d7a3
Add ability to mock subprocess calls (at a limited level)
saiarcot895 May 19, 2023
65c1bdb
Return an actual subprocess object, and add a test for checking timeout
saiarcot895 May 19, 2023
edea4ce
Change variable syntax
saiarcot895 May 22, 2023
f76e1e9
Fix set being accessed with an index
saiarcot895 May 22, 2023
e547f50
Add option to warm-reboot script to control if teamd retry count is r…
saiarcot895 May 23, 2023
a5966f2
Move the teamd retry count check to before orchagent
saiarcot895 May 23, 2023
1670cb7
Move retry count script start to be prior to point-of-no-return
saiarcot895 May 23, 2023
ee72908
Set executable bit
saiarcot895 May 23, 2023
c719f32
Address PR comments
saiarcot895 May 30, 2023
01852c6
Change to case-insensitive string contains check
saiarcot895 May 30, 2023
18fba7c
Make sure the global abort variable is used
saiarcot895 Jun 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions config/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2199,6 +2199,93 @@ def del_portchannel_member(ctx, portchannel_name, port_name):
except JsonPatchConflict:
ctx.fail("Invalid or nonexistent portchannel or interface. Please ensure existence of portchannel member.")

@portchannel.group(cls=clicommon.AbbreviationGroup, name='retry-count')
@click.pass_context
def portchannel_retry_count(ctx):
pass

def check_if_retry_count_is_enabled(ctx, portchannel_name):
try:
proc = subprocess.Popen(["teamdctl", portchannel_name, "state", "item", "get", "runner.enable_retry_count_feature"], text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = proc.communicate(timeout=10)
if proc.returncode != 0:
ctx.fail("Unable to determine if the retry count feature is enabled or not: {}".format(err.strip()))
return output.strip() == "true"
except subprocess.TimeoutExpired as e:
proc.kill()
proc.communicate()
ctx.fail("Unable to determine if the retry count feature is enabled or not: {}".format(e))

@portchannel_retry_count.command('get')
@click.argument('portchannel_name', metavar='<portchannel_name>', required=True)
@click.pass_context
def get_portchannel_retry_count(ctx, portchannel_name):
"""Get the retry count for a port channel"""
db = ValidatedConfigDBConnector(ctx.obj['db'])

# Don't proceed if the port channel name is not valid
if is_portchannel_name_valid(portchannel_name) is False:
ctx.fail("{} is invalid!, name should have prefix '{}' and suffix '{}'"
.format(portchannel_name, CFG_PORTCHANNEL_PREFIX, CFG_PORTCHANNEL_NO))

# Don't proceed if the port channel does not exist
if is_portchannel_present_in_db(db, portchannel_name) is False:
ctx.fail("{} is not present.".format(portchannel_name))

try:
is_retry_count_enabled = check_if_retry_count_is_enabled(ctx, portchannel_name)
if not is_retry_count_enabled:
ctx.fail("Retry count feature is not enabled!")

proc = subprocess.Popen(["teamdctl", portchannel_name, "state", "item", "get", "runner.retry_count"], text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = proc.communicate(timeout=10)
if proc.returncode != 0:
ctx.fail("Unable to get the retry count: {}".format(err.strip()))
click.echo(output.strip())
except FileNotFoundError:
ctx.fail("Unable to get the retry count: teamdctl could not be run")
except subprocess.TimeoutExpired as e:
proc.kill()
proc.communicate()
ctx.fail("Unable to get the retry count: {}".format(e))
except Exception as e:
ctx.fail("Unable to get the retry count: {}".format(e))

@portchannel_retry_count.command('set')
@click.argument('portchannel_name', metavar='<portchannel_name>', required=True)
@click.argument('retry_count', metavar='<retry_count>', required=True, type=click.IntRange(3,10))
@click.pass_context
def set_portchannel_retry_count(ctx, portchannel_name, retry_count):
"""Set the retry count for a port channel"""
db = ValidatedConfigDBConnector(ctx.obj['db'])

# Don't proceed if the port channel name is not valid
if is_portchannel_name_valid(portchannel_name) is False:
ctx.fail("{} is invalid!, name should have prefix '{}' and suffix '{}'"
.format(portchannel_name, CFG_PORTCHANNEL_PREFIX, CFG_PORTCHANNEL_NO))

# Don't proceed if the port channel does not exist
if is_portchannel_present_in_db(db, portchannel_name) is False:
ctx.fail("{} is not present.".format(portchannel_name))

try:
is_retry_count_enabled = check_if_retry_count_is_enabled(ctx, portchannel_name)
if not is_retry_count_enabled:
ctx.fail("Retry count feature is not enabled!")

proc = subprocess.Popen(["teamdctl", portchannel_name, "state", "item", "set", "runner.retry_count", str(retry_count)], text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = proc.communicate(timeout=10)
if proc.returncode != 0:
ctx.fail("Unable to set the retry count: {}".format(err.strip()))
except FileNotFoundError:
ctx.fail("Unable to set the retry count: teamdctl could not be run")
except subprocess.TimeoutExpired as e:
proc.kill()
proc.communicate()
ctx.fail("Unable to set the retry count: {}".format(e))
except Exception as e:
ctx.fail("Unable to set the retry count: {}".format(e))


#
# 'mirror_session' group ('config mirror_session ...')
Expand Down
15 changes: 15 additions & 0 deletions scripts/fast-reboot
Original file line number Diff line number Diff line change
Expand Up @@ -663,6 +663,17 @@ if [[ "$REBOOT_TYPE" = "fast-reboot" ]]; then
fi
fi

TEAMD_INCREASE_RETRY_COUNT=0
if [[ "${REBOOT_TYPE}" = "warm-reboot" || "${REBOOT_TYPE}" = "fastfast-reboot" ]]; then
/usr/local/bin/teamd_increase_retry_count.py --probe-only || TEAMD_RETRY_COUNT_PROBE_RC=$?
if [[ TEAMD_RETRY_COUNT_PROBE_RC -ne 0 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you missing a $ in variable usage? I think it should be if [[ $TEAMD_RETRY_COUNT_PROBE_RC -ne 0 ]]; then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial thinking as well, but then I saw the use of this syntax in other parts of this file (above and below this section), and it worked, so I followed that syntax to match that style. I don't entirely know why it works (is there some feature in bash that checks to see that the string is a variable if it's not a number?), but I think it is better to have it with the $.

I'll change it after verifying it doesn't break anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to set it to 0 before line 668? I agree it is better with $.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

I think you also need to set it to 0 before line 668?

Strictly speaking, it didn't appear to be required (it worked correctly even though $TEAMD_RETRY_COUNT_PROBE_RC wasn't set), but I set it to 0 anyways.

debug "Warning: Retry count feature support unknown for one or more neighbor devices; assuming that it's not available"
TEAMD_INCREASE_RETRY_COUNT=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supposed to bail out from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the images we deploy to our internal environment, this will be changed to bail out and fail the warm-reboot if SONiC neighbors are not seen. For the public version, since we can't require using SONiC-only neighbors for warm reboot, this will only be a warning.

else
TEAMD_INCREASE_RETRY_COUNT=1
fi
fi

# We are fully committed to reboot from this point on because critical
# service will go down and we cannot recover from it.
set +e
Expand All @@ -687,6 +698,10 @@ if [ -x ${LOG_SSD_HEALTH} ]; then
${LOG_SSD_HEALTH}
fi

if [[ ( "${REBOOT_TYPE}" = "warm-reboot" || "${REBOOT_TYPE}" = "fastfast-reboot" ) && "${TEAMD_INCREASE_RETRY_COUNT}" -eq 1 ]]; then
/usr/local/bin/teamd_increase_retry_count.py
fi

# Stop any timers to prevent any containers starting in the middle of the process.
TIMERS=$(systemctl list-dependencies --plain sonic-delayed.target | sed 1d)
for timer in ${TIMERS}; do
Expand Down
Loading