-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[determine-reboot-cause] Ignore non-hardware reboot cause #6349
Conversation
if proc_cmdline_reboot_cause is not None: | ||
previous_reboot_cause = software_reboot_cause | ||
elif hardware_reboot_cause is not REBOOT_CAUSE_NON_HARDWARE: | ||
elif REBOOT_CAUSE_NON_HARDWARE not in hardware_reboot_cause: | ||
previous_reboot_cause = hardware_reboot_cause | ||
else: | ||
previous_reboot_cause = software_reboot_cause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer using .startswith(REBOOT_CAUSE_NON_HARDWARE) rather than
REBOOT_CAUSE_NON_HARDWARE not in hardware_reboot_cause` to be more precise.
Maybe we can simplify this to:
if proc_cmdline_reboot_cause is not None or hardware_reboot_cause.startswith(REBOOT_CAUSE_NON_HARDWARE):
previous_reboot_cause = software_reboot_cause
else:
previous_reboot_cause = hardware_reboot_cause
We might also want to check if hardware_reboot_cause
is None:
if (proc_cmdline_reboot_cause is not None or
hardware_reboot_cause is None or
hardware_reboot_cause.startswith(REBOOT_CAUSE_NON_HARDWARE):
previous_reboot_cause = software_reboot_cause
else:
previous_reboot_cause = hardware_reboot_cause
@sujinmkang: Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I also would like if we can simplify the handling further to just:
- hardware_reboot_cause = None
# This variable is kept for future-use purpose. When proc_cmd_line/vendor/software provides
# any additional_reboot_info it will be stored as a "comment" in REBOOT_CAUSE_HISTORY_FILE
additional_reboot_info = "N/A"
- # 1. Check if the previous reboot was warm/fast reboot by testing whether there is "fast|fastfast|warm" in /proc/cmdline
- proc_cmdline_reboot_cause = find_proc_cmdline_reboot_cause()
-
- # 2. Check if the previous reboot was caused by hardware
- # If yes, the hardware reboot cause will be treated as the reboot cause
- hardware_reboot_cause = find_hardware_reboot_cause()
-
- # 3. If there is a REBOOT_CAUSE_FILE, it will contain any software-related
- # reboot info. We will use it as the previous cause.
- software_reboot_cause = find_software_reboot_cause()
-
- # The main decision logic of the reboot cause:
- # If there is a reboot cause indicated by /proc/cmdline, it should be warmreboot/fastreboot
- # the software_reboot_cause which is the content of /hosts/reboot-cause/reboot-cause.txt
- # will be treated as the reboot cause
- # Elif there is a reboot cause indicated by platform API,
- # the hardware_reboot_cause will be treated as the reboot cause
- # Else the software_reboot_cause will be treated as the reboot cause
- if proc_cmdline_reboot_cause is not None:
- previous_reboot_cause = software_reboot_cause
- elif REBOOT_CAUSE_NON_HARDWARE not in hardware_reboot_cause:
- previous_reboot_cause = hardware_reboot_cause
+ # Check if the previous reboot was warm/fast reboot by testing whether there is "fast|fastfast|warm" in /proc/cmdline
+ previous_reboot_cause = find_proc_cmdline_reboot_cause()
+
+ # If /proc/cmdline does not indicate reboot cause, check if the previous reboot was caused by hardware
+ if previous_reboot_cause is None:
+ previous_reboot_cause = find_hardware_reboot_cause()
+ if previous_reboot_cause.startswith(REBOOT_CAUSE_NON_HARDWARE):
+ # If the reboot cause if non HW, get the reboot cause from REBOOT_CAUSE_FILE
+ previous_reboot_cause = find_software_reboot_cause()
else:
- previous_reboot_cause = software_reboot_cause
+ # Get the reboot cause from REBOOT_CAUSE_FILE
+ previous_reboot_cause = find_software_reboot_cause()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condensed logic will then look like:
# Remove stale PREVIOUS_REBOOT_CAUSE_FILE if it exists
if os.path.exists(PREVIOUS_REBOOT_CAUSE_FILE):
os.remove(PREVIOUS_REBOOT_CAUSE_FILE)
# This variable is kept for future-use purpose. When proc_cmd_line/vendor/software provides
# any additional_reboot_info it will be stored as a "comment" in REBOOT_CAUSE_HISTORY_FILE
additional_reboot_info = "N/A"
# Check if the previous reboot was warm/fast reboot by testing whether there is "fast|fastfast|warm" in /proc/cmdline
previous_reboot_cause = find_proc_cmdline_reboot_cause()
# If /proc/cmdline does not indicate reboot cause, check if the previous reboot was caused by hardware
if previous_reboot_cause is None:
previous_reboot_cause = find_hardware_reboot_cause()
if previous_reboot_cause.startswith(REBOOT_CAUSE_NON_HARDWARE):
# If the reboot cause if non HW, get the reboot cause from REBOOT_CAUSE_FILE
previous_reboot_cause = find_software_reboot_cause()
else:
# Get the reboot cause from REBOOT_CAUSE_FILE
previous_reboot_cause = find_software_reboot_cause()
# Current time
reboot_cause_gen_time = str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S'))
# Save the previous cause info into its history file as json format
reboot_cause_dict = get_reboot_cause_dict(previous_reboot_cause, additional_reboot_info, reboot_cause_gen_time)
# Create reboot-cause-#time#.json under history directory
REBOOT_CAUSE_HISTORY_FILE = os.path.join(REBOOT_CAUSE_HISTORY_DIR, "reboot-cause-{}.json".format(reboot_cause_gen_time))
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. I also suggest using a different variable to store the /proc/cmdline cause (I also reworded a comment):
# Remove stale PREVIOUS_REBOOT_CAUSE_FILE if it exists
if os.path.exists(PREVIOUS_REBOOT_CAUSE_FILE):
os.remove(PREVIOUS_REBOOT_CAUSE_FILE)
# This variable is kept for future-use purpose. When proc_cmd_line/vendor/software provides
# any additional_reboot_info it will be stored as a "comment" in REBOOT_CAUSE_HISTORY_FILE
additional_reboot_info = "N/A"
# Check if the previous reboot was warm/fast reboot by testing whether there is "fast|fastfast|warm" in /proc/cmdline
proc_cmdline_reboot_cause = find_proc_cmdline_reboot_cause()
# If /proc/cmdline does not indicate reboot cause, check if the previous reboot was caused by hardware
if proc_cmdline_reboot_cause is None:
previous_reboot_cause = find_hardware_reboot_cause()
if previous_reboot_cause.startswith(REBOOT_CAUSE_NON_HARDWARE):
# If the reboot cause is non-hardware, get the reboot cause from REBOOT_CAUSE_FILE
previous_reboot_cause = find_software_reboot_cause()
else:
# Get the reboot cause from REBOOT_CAUSE_FILE
previous_reboot_cause = find_software_reboot_cause()
# Current time
reboot_cause_gen_time = str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S'))
# Save the previous cause info into its history file as json format
reboot_cause_dict = get_reboot_cause_dict(previous_reboot_cause, additional_reboot_info, reboot_cause_gen_time)
# Create reboot-cause-#time#.json under history directory
REBOOT_CAUSE_HISTORY_FILE = os.path.join(REBOOT_CAUSE_HISTORY_DIR, "reboot-cause-{}.json".format(reboot_cause_gen_time))
We could also simplify further, but the logic might not be as clear:
# Remove stale PREVIOUS_REBOOT_CAUSE_FILE if it exists
if os.path.exists(PREVIOUS_REBOOT_CAUSE_FILE):
os.remove(PREVIOUS_REBOOT_CAUSE_FILE)
# This variable is kept for future-use purpose. When proc_cmd_line/vendor/software provides
# any additional_reboot_info it will be stored as a "comment" in REBOOT_CAUSE_HISTORY_FILE
additional_reboot_info = "N/A"
# Default the reboot cause to the cause found in REBOOT_CAUSE_FILE
previous_reboot_cause = find_software_reboot_cause()
# Check if the previous reboot was warm/fast reboot by testing whether there is "fast|fastfast|warm" in /proc/cmdline
proc_cmdline_reboot_cause = find_proc_cmdline_reboot_cause()
# If /proc/cmdline does not indicate reboot cause, check if the previous reboot was caused by hardware
if proc_cmdline_reboot_cause is None:
hardware_reboot_cause = find_hardware_reboot_cause()
if not hardware_reboot_cause.startswith(REBOOT_CAUSE_NON_HARDWARE):
previous_reboot_cause = hardware_reboot_cause
# Current time
reboot_cause_gen_time = str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S'))
# Save the previous cause info into its history file as json format
reboot_cause_dict = get_reboot_cause_dict(previous_reboot_cause, additional_reboot_info, reboot_cause_gen_time)
# Create reboot-cause-#time#.json under history directory
REBOOT_CAUSE_HISTORY_FILE = os.path.join(REBOOT_CAUSE_HISTORY_DIR, "reboot-cause-{}.json".format(reboot_cause_gen_time))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Update the logic now. I kept it at the first proposal.
@sujinmkang , can you sign-off? |
@vaibhavhd It looks good to me but I was also thinking about logic like below.
While checking the findhardware_reboot_cause(), if there is no minor cause available, dummy "()" is following the major hardware_reboot_cause all the time. Not all hardware reboot cause has hardware_reboot_cause_minor.
|
Thanks Sujin, I agree with the suggestion of not including minor dummy cause. As discussed, I will create a new PR for this enhancement along with some other minor refactoring that I have in mind. For now, let's go ahead with the bug fix for the issue. |
- Why I did it - Reboot cause prints "Non-Hardware (N/A)" instead of showing the software reboot cause. The issue is mishandling of hardware reboot cause in determine-reboot-cause script. - How I did it Fixed the handling for Non Hardware reboot cause. Ignore if Non-Hardware is present in the hardware_reboot_cause output. Added some code refactoring for simplicity. - How to verify it - With fix, the hardware reboot cause is ignored (if it is non hw):
- Why I did it
Reboot cause prints "Non-Hardware" instead of showing software reboot cause.
- How I did it
Fixed the handling for Non Hardware reboot cause. Ignore if
Non-Hardware
is present in the hardware_reboot_cause output.This will make software reboot cause to be taken as the effective reboot-cause.
- How to verify it
With fix, the hardware reboot cause is ignored (if it is non hw):
- Which release branch to backport (provide reason below if selected)
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)