-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System time corrupted across reboots (IDFGH-7930) #9448
Comments
I have seen this as well. |
I tried both suggestions from the forum thread mentioned above:
Neither worked. |
Could this be some kind of re-calibration attempt which then, based on the result, also changes the already counted RTC ticks (by a correction factor I guess). This would explain why the phenomenon gets worse the longer the device runs. In esp_clk.h it says about esp_clk_rtc_time():
Maybe this calibration shouldn't happen on every boot, but only after a ESP_RST_POWERON (only when the RTC clock is 0)? |
Setting the config option CONFIG_ESP32_RTC_CLK_CAL_CYCLES to 0 to disable calibration causes a big time jump in the next restart, but after that, the problem is gone! So this confirms that how the calibration is done is the root cause. |
Hi @kriegste! |
@KonstantinKondrashov Any updates here? |
Can you fix it? The problem persists in version 4.4.2. |
...or life sign... |
Hi! Sorry for the delay, It might be linked to this issue #9455 as well. |
I run this example just without wifi and sntp. It worked for an hour, but the issue is not shown.
Seems like when wifi app works, we get that chip becomes warmer and the freq of the RC source goes. |
Leave it running for a few hours at least. Maybe heating/freezing the chip just before the restart can also provoke the effect. It has something to do with the calibration value esp_clk_slowclk_cal_get() changing and then "correcting" the RTC clock (the ticks which already passed since hardware reset) and this in turn changes the system time. This also explains why disabling the calibration altogether (CONFIG_ESP32_RTC_CLK_CAL_CYCLES = 0) is a workaround. |
Finally, I got the log that shows this issue.
|
Hi @kriegste! diff --git a/components/esp_hw_support/esp_clk.c b/components/esp_hw_support/esp_clk.c
index 752a3e987ef..c1dff5f8764 100644
--- a/components/esp_hw_support/esp_clk.c
+++ b/components/esp_hw_support/esp_clk.c
@@ -53,7 +53,7 @@ extern uint32_t g_ticks_per_us_app;
static portMUX_TYPE s_esp_rtc_time_lock = portMUX_INITIALIZER_UNLOCKED;
// TODO: IDF-4239
-static RTC_DATA_ATTR uint64_t s_esp_rtc_time_us = 0, s_rtc_last_ticks = 0;
+static RTC_NOINIT_ATTR uint64_t s_esp_rtc_time_us, s_rtc_last_ticks;
inline static int IRAM_ATTR s_get_cpu_freq_mhz(void)
{
@@ -100,6 +100,10 @@ uint64_t esp_rtc_get_time_us(void)
#endif
portENTER_CRITICAL_SAFE(&s_esp_rtc_time_lock);
const uint32_t cal = esp_clk_slowclk_cal_get();
+ if (cal == 0) {
+ s_esp_rtc_time_us = 0;
+ s_rtc_last_ticks = 0;
+ }
const uint64_t rtc_this_ticks = rtc_time_get();
const uint64_t ticks = rtc_this_ticks - s_rtc_last_ticks;
/* RTC counter result is up to 2^48, calibration factor is up to 2^24, |
Thanks. I am already testing on two devices. Tomorrow I'll post results. |
@KonstantinKondrashov |
Looks very good. No problems so far. |
The backports are ready and will be merged soon. |
@KonstantinKondrashov - release/v5.0 too? |
Hi @someburner! Sure. The v5.0 backport is ready and will be merged soon. Thanks. |
Sorry, our internal release/v5.0 branch is still locked. This fix will be merged when possible. |
Hello @kriegste and @KonstantinKondrashov ! Thanks for your work on this. I hope it's ok to resurrect this in the initial issue -- I wanted to alert you to what I believe may be an issue regarding this change in the context of OTA firmware updates. The issue we're seeing is that changing s_esp_rtc_time_us and s_rtc_last_ticks to be RTC_NOINIT_ATTR and initializing them in esp_rtc_get_time_us() only works if the slow clock calibration register is zero. That is fine on power-up, but I believe that after an OTA update that register will still be set from before the update. Consequently, s_esp_rtc_time_us and s_rtc_last_ticks will not be initialized, and if variables in RTC RAM have been moved around by the linker, they will be filled with garbage data. Any reset other than OTA would work fine because the variables would still be initialized from the initial bootup. I think we saw this on our firmware, and it resulted in incorrect values being reported by esp_rtc_get_time_us(). I'd be curious if you agree with this observation or if I'm off-base somehow. Ideally during an OTA our system clock could stay valid, but at the least I'd like to avoid the undefined behavior we're currently experiencing. |
We see exactly the same problem. @kriegste and @KonstantinKondrashov from our point of view the change fixes the issue discussed in this thread. Unfortunately it created an issue after a firmware update, when addresses of the variables have changed. |
@kriegste @boribosnjak, |
For what it's worth, I was able to add a shutdown handler with esp_register_shutdown_handler() All this handler does is zero out the clock calibration: That, by itself, seems to be enough to preserve the clock and avoid this new bug on our configuration (ESP32-C3 with 32k XTAL). I'm not sure if it would be effective in all cases. There would be a loss of precision during a reset, but for our application we don't really care, and we only need to do this if it is a OTA update. |
@natsco-sbottoms Thank you for the hint. Sounds like a reasonable solution to me. |
Hi! |
Thanks for the update @KonstantinKondrashov |
@KonstantinKondrashov |
@KonstantinKondrashov |
@igrr @Alvin1Zhang |
Hi @AxelLin! Sorry for the delay. It was on review for a long time, now it is merged and soon will appear here (backports as well). |
The fix is not available in any release branch so far. |
…last_ticks were moved around The commit fixes the case: If variables in RTC RAM have been moved around by the linker, they will be filled with garbage data. Any reset other than OTA would work fine because the variables would still be initialized from the initial bootup. So now system time will be valid even after OTA. Closes #9448
@KonstantinKondrashov |
Hi! v5.0 is not merged yet but it is ready to go. For the rest (4.4, 4.3) branches, the backport will not be provided according to backport policy. |
thank you for the update. that is no good news. I was hoping to see the fix in upcoming 4.4.6. In that case we will prepare an update to v5.1 |
This is a BUGFIX. no idea why your policy does not support bugfix in maintenance period. BTW, the regression was reported on Feb 28 (#9448 (comment)) , (I have been waiting the fix for v4.3 for 3 months #9448 (comment)) |
…last_ticks were moved around The commit fixes the case: If variables in RTC RAM have been moved around by the linker, they will be filled with garbage data. Any reset other than OTA would work fine because the variables would still be initialized from the initial bootup. So now system time will be valid even after OTA. Closes #9448
Please add this to known issue of upcoming v4.3.x and v4.4.x release notes. |
Environment
Problem Description
After leaving it running for a while (1-3 days) if the device is soft-reset using esp_restart() the system time is wrong. Sometimes only by seconds, sometimes by hours, in both directions. SNTP will correct this later, however during the initial phase system time cannot be relied on.
The phenomenon gets worse the longer the device is on (= the more time has passed since the last hard-reset).
Interestingly, esp_rtc_get_time_us() which indicates the time since the last hard-reset also jumps the same amount.
There is a four year old forum thread indicating the same problem:
https://www.esp32.com/viewtopic.php?t=7544
Expected Behavior
System time running continuously across reboots.
Code to reproduce this issue
Make sure to switch on system time stamps for logging!
Leave the example running for 1 day or more and from time to time look at the output. I used PuTTY since the IDF monitor causes a hard-reset on reconnect. An automated reboot will happen once per hour. Compare the time stamps right before and after the restarts.
Debug Logs
Restarting takes maybe one second or so, but as you can see in this case time jumped backwards more than half a minute. Which should be impossible.
The text was updated successfully, but these errors were encountered: