Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reboot] User-friendly reboot cause message for kernel panic #1486

Merged
merged 10 commits into from
Mar 28, 2021

Conversation

yozhao101
Copy link
Contributor

@yozhao101 yozhao101 commented Mar 10, 2021

Signed-off-by: Yong Zhao [email protected]

What I did

If the rebooting of SONiC device was caused by kernel panic, then the CLI command show reboot-cause should show Kernel Panic.

How I did it

Currently if kernel was panicked, then the device would be rebooted. The reboot script wrote a message into reboot-cause.txt. I just updated the content of this message.

How to verify it

I verified this change on the virtual switch in the following steps:

  1. Trigger kernel panic: echo c > /proc/sysrq-trigger
  2. After device was rebooted, run the CLI show reboot-cause:
    admin@vlab-01:~$ show reboot-cause
    Kernel Panic [Time: Tue 09 Mar 2021 03:03:56 AM UTC]

Previous command output (if the output of a command-line utility has changed)

admin@vlab-01:~$ show reboot-cause
User issued 'kdump' command [User: kdump, Time: Mon 08 Mar 2021 01:47:43 AM UTC]

New command output (if the output of a command-line utility has changed)

admin@vlab-01:~$ show reboot-cause
Kernel Panic [Time: Tue 09 Mar 2021 03:03:56 AM UTC]

@jleveque
Copy link
Contributor

@sujinmkang: We want to rephrase this message to be more user-friendly. However, we may need to either adjust the syntax here or in the process-reboot-cause parsing logic to parse the fields properly. Please provide your feedback.

@jleveque jleveque changed the title [reboot] Get reboot reason if kernel was panicked [reboot] User-friendly reboot cause message for kernel panic Mar 12, 2021
@sujinmkang
Copy link
Collaborator

@jleveque does this PR also wants to add the detail kernel panic information or just inform the kernel crash as last reboot cause?

@jleveque
Copy link
Contributor

@jleveque does this PR also wants to add the detail kernel panic information or just inform the kernel crash as last reboot cause?

Just inform that kernel panic was the last reboot cause.

@sujinmkang
Copy link
Collaborator

@jleveque If the kernel panic happens after passing this line, I mean, if it happens during reboot, then there is still having a chance to miss the kernel panic? I think it will be good to have the core file directory or file name which is related to the kernel panic. We can add it from process-reboot-cause.

@jleveque
Copy link
Contributor

@jleveque If the kernel panic happens after passing this line, I mean, if it happens during reboot, then there is still having a chance to miss the kernel panic? I think it will be good to have the core file directory or file name which is related to the kernel panic. We can add it from process-reboot-cause.

I see your point, but if a kernel panic occurs during reboot, it will be difficult (possibly impossible, e.g., if the filesystem is read-only) to leave a breadcrumb that we can use after booting back up to determine that a kernel panic occurred.

Also, technically speaking, if a user issues one of the reboot commands and a kernel panic happens during the reboot process, what was the cause of the reboot? Technically the reboot was initiated by the user.

@sujinmkang
Copy link
Collaborator

https://github.com/Azure/sonic-buildimage/blob/28cb43cb42e3223cade2efa9a5f60542d97a89e7/src/sonic-host-services/scripts/determine-reboot-cause#L129
@yozhao101 please add the kernel panic case if you want to change the reboot-cause here.

@sujinmkang
Copy link
Collaborator

show/reboot_cause.py Outdated Show resolved Hide resolved
show/reboot_cause.py Outdated Show resolved Hide resolved
show/reboot_cause.py Outdated Show resolved Hide resolved
show/reboot_cause.py Outdated Show resolved Hide resolved
Comment on lines 39 to 45
reboot_cause_str = "Cause: {}".format(reboot_cause["cause"])

if "user" in reboot_cause.keys() and reboot_cause["user"] != "N/A":
reboot_cause_str += ", User: {}".format(reboot_cause["user"])

if "time" in reboot_cause.keys() and reboot_cause["time"] != "N/A":
reboot_cause_str += ", Time: {}".format(reboot_cause["time"])
Copy link
Contributor

@jleveque jleveque Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for "Cause:` prefix. Also, we lose the "User issued '' command" syntax here. Instead we just see the command.

Suggest keeping format the same as before.

Something like the following:

reboot_cause_dict = read_reboot_cause_file()

reboot_cause = reboot_cause_dict .get("cause", "Unknown")
reboot_user = reboot_cause_dict .get("user", "N/A")
reboot_time = reboot_cause_dict .get("time", "N/A")

if reboot_user != "N/A":
    reboot_cause_str = "User issued '<command>' command".format(reboot_cause)
else:
    reboot_cause_str = reboot_cause

if reboot_user != "N/A" or reboot_time != "N/A":
    reboot_cause_str += " ["

    if reboot_user != "N/A":
        reboot_cause_str += "User: {}".format(reboot_user )
        if reboot_time != "N/A":
            reboot_cause_str += ", "

    if reboot_time != "N/A":
        reboot_cause_str += "Time: {}".format(reboot_time)

    reboot_cause_str += "]"

scripts/reboot Outdated Show resolved Hide resolved
@lgtm-com

This comment has been minimized.

jleveque
jleveque previously approved these changes Mar 26, 2021
show/reboot_cause.py Outdated Show resolved Hide resolved
when previous reboot file can not be read successfully.

Signed-off-by: Yong Zhao <[email protected]>
jleveque
jleveque previously approved these changes Mar 26, 2021
@yozhao101
Copy link
Contributor Author

Retest this please.

@yozhao101 yozhao101 merged commit 4d89510 into sonic-net:master Mar 28, 2021
@yozhao101 yozhao101 deleted the get_reboot_reason branch March 28, 2021 17:14
yozhao101 added a commit to sonic-net/sonic-buildimage that referenced this pull request Mar 31, 2021
Signed-off-by: Yong Zhao [email protected]

Why I did it
If device reboot was caused by kernel panic, then we need retrieve and store the key information into the symbol file previous-reboot-cause.json. The CLI show reboot-cause will read this file to get the reason of previous reboot.

This PR is related to PR in sonic-utilities repo: sonic-net/sonic-utilities#1486

How I did it
The string variable previous_reboot_cause will be parsed to check whether it contains the keyword Kernel Panic. If it did, then store the keyword and time information into a dictionary.

How to verify it
I verified this change on a virtual testbed.

admin@vlab-01:/host/reboot-cause$ more previous-reboot-cause.json
{"gen_time": "2021_03_24_23_22_35", "cause": "Kernel Panic", "user": "N/A", "time": "Wed 24 Mar 2021 11:22:03 PM UTC", "comment": "N/A"}

admin@vlab-01:/host/reboot-cause$ show reboot-cause
Kernel Panic [Time: Wed 24 Mar 2021 11:22:03 PM UTC]
daall pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Mar 31, 2021
Signed-off-by: Yong Zhao [email protected]

Why I did it
If device reboot was caused by kernel panic, then we need retrieve and store the key information into the symbol file previous-reboot-cause.json. The CLI show reboot-cause will read this file to get the reason of previous reboot.

This PR is related to PR in sonic-utilities repo: sonic-net/sonic-utilities#1486

How I did it
The string variable previous_reboot_cause will be parsed to check whether it contains the keyword Kernel Panic. If it did, then store the keyword and time information into a dictionary.

How to verify it
I verified this change on a virtual testbed.

admin@vlab-01:/host/reboot-cause$ more previous-reboot-cause.json
{"gen_time": "2021_03_24_23_22_35", "cause": "Kernel Panic", "user": "N/A", "time": "Wed 24 Mar 2021 11:22:03 PM UTC", "comment": "N/A"}

admin@vlab-01:/host/reboot-cause$ show reboot-cause
Kernel Panic [Time: Wed 24 Mar 2021 11:22:03 PM UTC]
daall pushed a commit that referenced this pull request Apr 1, 2021
Signed-off-by: Yong Zhao [email protected]

What I did
If the rebooting of SONiC device was caused by kernel panic, then the CLI command show reboot-cause should show Kernel Panic.

How I did it
Currently if kernel was panicked, then the device would be rebooted. The reboot script wrote a message into reboot-cause.txt. I just updated the content of this message.

How to verify it
I verified this change on the virtual switch in the following steps:

Trigger kernel panic: echo c > /proc/sysrq-trigger
After device was rebooted, run the CLI show reboot-cause:
admin@vlab-01:~$ show reboot-cause
Kernel Panic [Time: Tue 09 Mar 2021 03:03:56 AM UTC]
Previous command output (if the output of a command-line utility has changed)
admin@vlab-01:~$ show reboot-cause
User issued 'kdump' command [User: kdump, Time: Mon 08 Mar 2021 01:47:43 AM UTC]

New command output (if the output of a command-line utility has changed)
admin@vlab-01:~$ show reboot-cause
Kernel Panic [Time: Tue 09 Mar 2021 03:03:56 AM UTC]
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-buildimage that referenced this pull request May 23, 2021
…et#7153)

Signed-off-by: Yong Zhao [email protected]

Why I did it
If device reboot was caused by kernel panic, then we need retrieve and store the key information into the symbol file previous-reboot-cause.json. The CLI show reboot-cause will read this file to get the reason of previous reboot.

This PR is related to PR in sonic-utilities repo: sonic-net/sonic-utilities#1486

How I did it
The string variable previous_reboot_cause will be parsed to check whether it contains the keyword Kernel Panic. If it did, then store the keyword and time information into a dictionary.

How to verify it
I verified this change on a virtual testbed.

admin@vlab-01:/host/reboot-cause$ more previous-reboot-cause.json
{"gen_time": "2021_03_24_23_22_35", "cause": "Kernel Panic", "user": "N/A", "time": "Wed 24 Mar 2021 11:22:03 PM UTC", "comment": "N/A"}

admin@vlab-01:/host/reboot-cause$ show reboot-cause
Kernel Panic [Time: Wed 24 Mar 2021 11:22:03 PM UTC]
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
…et#7153)

Signed-off-by: Yong Zhao [email protected]

Why I did it
If device reboot was caused by kernel panic, then we need retrieve and store the key information into the symbol file previous-reboot-cause.json. The CLI show reboot-cause will read this file to get the reason of previous reboot.

This PR is related to PR in sonic-utilities repo: sonic-net/sonic-utilities#1486

How I did it
The string variable previous_reboot_cause will be parsed to check whether it contains the keyword Kernel Panic. If it did, then store the keyword and time information into a dictionary.

How to verify it
I verified this change on a virtual testbed.

admin@vlab-01:/host/reboot-cause$ more previous-reboot-cause.json
{"gen_time": "2021_03_24_23_22_35", "cause": "Kernel Panic", "user": "N/A", "time": "Wed 24 Mar 2021 11:22:03 PM UTC", "comment": "N/A"}

admin@vlab-01:/host/reboot-cause$ show reboot-cause
Kernel Panic [Time: Wed 24 Mar 2021 11:22:03 PM UTC]
ganglyu pushed a commit to sonic-net/sonic-host-services that referenced this pull request Jul 12, 2022
Signed-off-by: Yong Zhao [email protected]

Why I did it
If device reboot was caused by kernel panic, then we need retrieve and store the key information into the symbol file previous-reboot-cause.json. The CLI show reboot-cause will read this file to get the reason of previous reboot.

This PR is related to PR in sonic-utilities repo: sonic-net/sonic-utilities#1486

How I did it
The string variable previous_reboot_cause will be parsed to check whether it contains the keyword Kernel Panic. If it did, then store the keyword and time information into a dictionary.

How to verify it
I verified this change on a virtual testbed.

admin@vlab-01:/host/reboot-cause$ more previous-reboot-cause.json
{"gen_time": "2021_03_24_23_22_35", "cause": "Kernel Panic", "user": "N/A", "time": "Wed 24 Mar 2021 11:22:03 PM UTC", "comment": "N/A"}

admin@vlab-01:/host/reboot-cause$ show reboot-cause
Kernel Panic [Time: Wed 24 Mar 2021 11:22:03 PM UTC]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants