System time corrupted across reboots (IDFGH-7930) #9448

kriegste · 2022-07-27T19:40:43Z

Environment

Module or chip used: ESP32 on TTGO, but happens on other devices as well
IDF version: 4.4.1 release
Build System: Standard Windows VSCode installation via the tutorial
Power Supply: USB on the TTGO, but other devices with their own supply behave the same

Problem Description

After leaving it running for a while (1-3 days) if the device is soft-reset using esp_restart() the system time is wrong. Sometimes only by seconds, sometimes by hours, in both directions. SNTP will correct this later, however during the initial phase system time cannot be relied on.
The phenomenon gets worse the longer the device is on (= the more time has passed since the last hard-reset).

Interestingly, esp_rtc_get_time_us() which indicates the time since the last hard-reset also jumps the same amount.

There is a four year old forum thread indicating the same problem:
https://www.esp32.com/viewtopic.php?t=7544

Expected Behavior

System time running continuously across reboots.

Code to reproduce this issue

Make sure to switch on system time stamps for logging!

Leave the example running for 1 day or more and from time to time look at the output. I used PuTTY since the IDF monitor causes a hard-reset on reconnect. An automated reboot will happen once per hour. Compare the time stamps right before and after the restarts.

#include "esp_log.h"
#include "esp_sntp.h"
#include "esp_wifi.h"
#include "nvs_flash.h"
#include "esp32/rtc.h"
 
static const char *TAG = "main";
 
void time_sync_notification_cb(struct timeval *tv) {
    ESP_LOGI(TAG, "time_sync_notification_cb");
}
 
void log_rtc(void) {
    int u = esp_rtc_get_time_us() / 1000000;
    ESP_LOGI(TAG, "rtc=%d seconds (%d days %02d:%02d:%02d)", u, u / (24*3600), (u % (24*3600)) / 3600, (u % 3600) / 60, u % 60);
}
 
void app_main(void) {
    log_rtc();
    ESP_LOGI(TAG, "start");
 
    ESP_ERROR_CHECK(nvs_flash_init());
    ESP_ERROR_CHECK(esp_netif_init());
    ESP_ERROR_CHECK(esp_event_loop_create_default());
    esp_netif_create_default_wifi_sta();
    wifi_init_config_t wifi_init_config = WIFI_INIT_CONFIG_DEFAULT();
    ESP_ERROR_CHECK(esp_wifi_init(&wifi_init_config));
 
    wifi_config_t conf_sta = {
        .sta.ssid = "xxxxx",
    };
    esp_wifi_set_config(WIFI_IF_STA, &conf_sta);
 
    ESP_ERROR_CHECK(esp_wifi_start());
    esp_wifi_connect();
 
    sntp_setoperatingmode(SNTP_OPMODE_POLL);
    sntp_setservername(0, "pool.ntp.org");
    sntp_set_time_sync_notification_cb(time_sync_notification_cb);
    sntp_init();
 
    // Wait an hour
    for (int i = 0; i < 60; i++) {
        ESP_LOGI(TAG, "%d", i);
        vTaskDelay(60000 / portTICK_PERIOD_MS);
    }
 
    ESP_LOGI(TAG, "end");
    log_rtc();
 
    esp_restart();
//  esp_abort(); // TODO: Try the other "restart methods" and see if there is a difference
//  esp_deep_sleep(1000000);
}

Debug Logs

I (08:07:52.302) main: 58
I (08:08:52.302) main: 59
I (08:09:52.302) main: end
I (08:09:52.303) main: rtc=32399 seconds (0 days 08:59:59)
[...]
ets Jun 8 2016 00:22:57
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[...]
I (08:09:15.296) main: rtc=32362 seconds (0 days 08:59:22)
I (08:09:15.301) main: start

Restarting takes maybe one second or so, but as you can see in this case time jumped backwards more than half a minute. Which should be impossible.

The text was updated successfully, but these errors were encountered:

someburner · 2022-07-28T08:31:25Z

I have seen this as well.

kriegste · 2022-07-29T11:16:19Z

I tried both suggestions from the forum thread mentioned above:

calling esp_sync_counters_rtc_and_frc() before restart
the menuconfig option to just use RTC for gettimeofday

Neither worked.

kriegste · 2022-07-29T23:00:14Z

Could this be some kind of re-calibration attempt which then, based on the result, also changes the already counted RTC ticks (by a correction factor I guess). This would explain why the phenomenon gets worse the longer the device runs.

In esp_clk.h it says about esp_clk_rtc_time():

@attention The value returned by this function may change abruptly when

calibration value of RTC counter is updated via esp_clk_slowclk_cal_set

function. This should not happen unless application calls esp_clk_slowclk_cal_set.

In ESP-IDF, esp_clk_slowclk_cal_set is only called in startup code.

Maybe this calibration shouldn't happen on every boot, but only after a ESP_RST_POWERON (only when the RTC clock is 0)?

kriegste · 2022-07-31T21:10:47Z

Setting the config option CONFIG_ESP32_RTC_CLK_CAL_CYCLES to 0 to disable calibration causes a big time jump in the next restart, but after that, the problem is gone! So this confirms that how the calibration is done is the root cause.

KonstantinKondrashov · 2022-08-03T07:59:29Z

Hi @kriegste!
Thank you for the test example. I will check it and fix this issue.

someburner · 2022-08-19T01:51:44Z

@KonstantinKondrashov Any updates here?

kriegste · 2022-09-18T11:34:38Z

Can you fix it? The problem persists in version 4.4.2.

kriegste · 2022-10-22T07:59:36Z

...or life sign...

KonstantinKondrashov · 2022-10-23T12:16:23Z

Hi! Sorry for the delay, It might be linked to this issue #9455 as well.
I will revisit the whole timekeeping functionality, as I remember, I saw something weird. I will provide a patch here before merging it.

KonstantinKondrashov · 2022-10-26T18:13:09Z

I run this example just without wifi and sntp. It worked for an hour, but the issue is not shown.

I (3420336) main: 57
I (3480336) main: 58
I (3540336) main: 59
I (3600336) main: end
I (3600336) main: rtc=3638 seconds (0 days 01:00:38)

ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)

I (0) cpu_start: Starting scheduler on APP CPU.
I (326) main: rtc=3639 seconds (0 days 01:00:39)
I (336) main: start
I (336) main: 0
I (60336) main: 1

Seems like when wifi app works, we get that chip becomes warmer and the freq of the RC source goes.
I will try with wifi enabled.

kriegste · 2022-10-26T22:04:30Z

Leave it running for a few hours at least. Maybe heating/freezing the chip just before the restart can also provoke the effect.

It has something to do with the calibration value esp_clk_slowclk_cal_get() changing and then "correcting" the RTC clock (the ticks which already passed since hardware reset) and this in turn changes the system time.

This also explains why disabling the calibration altogether (CONFIG_ESP32_RTC_CLK_CAL_CYCLES = 0) is a workaround.

KonstantinKondrashov · 2022-10-27T08:10:42Z

Finally, I got the log that shows this issue.

1) boot
I (3540336) main: 59
I (3600336) main: end
I (3600336) main: rtc=3638 seconds (0 days 01:00:38)
ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
I (326) main: rtc=3639 seconds (0 days 01:00:39) ------------------------------------   = +1sec
I (336) main: start
I (336) main: 0

2) boot
I (3540336) main: 59
I (3600336) main: end
I (3600336) main: rtc=7240 seconds (0 days 02:00:40)
ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
I (326) main: rtc=7199 seconds (0 days 01:59:59) ------------------------------------   = -41sec
I (336) main: start
I (336) main: 0

3) boot
I (3540336) main: 59
I (3600336) main: end
I (3600336) main: rtc=10802 seconds (0 days 03:00:02)
ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
I (326) main: rtc=10804 seconds (0 days 03:00:04) ------------------------------------   = +2sec
I (336) main: start
I (336) main: 0

4) boot
I (3540336) main: 59
I (3600336) main: end
I (3600336) main: rtc=14405 seconds (0 days 04:00:05)
ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
I (326) main: rtc=14395 seconds (0 days 03:59:55) ------------------------------------   = -10sec
I (336) main: start
I (336) main: 0

KonstantinKondrashov · 2022-10-27T14:08:44Z

Hi @kriegste!
There is the patch that resolve this issue. The first 4 hours look good. I did not run all related tests yet, only this example, but I think it is the root case of this issue.
Could you try this fix?
Thanks.

diff --git a/components/esp_hw_support/esp_clk.c b/components/esp_hw_support/esp_clk.c
index 752a3e987ef..c1dff5f8764 100644
--- a/components/esp_hw_support/esp_clk.c
+++ b/components/esp_hw_support/esp_clk.c
@@ -53,7 +53,7 @@ extern uint32_t g_ticks_per_us_app;
 static portMUX_TYPE s_esp_rtc_time_lock = portMUX_INITIALIZER_UNLOCKED;
 
 // TODO: IDF-4239
-static RTC_DATA_ATTR uint64_t s_esp_rtc_time_us = 0, s_rtc_last_ticks = 0;
+static RTC_NOINIT_ATTR uint64_t s_esp_rtc_time_us, s_rtc_last_ticks;
 
 inline static int IRAM_ATTR s_get_cpu_freq_mhz(void)
 {
@@ -100,6 +100,10 @@ uint64_t esp_rtc_get_time_us(void)
 #endif
     portENTER_CRITICAL_SAFE(&s_esp_rtc_time_lock);
     const uint32_t cal = esp_clk_slowclk_cal_get();
+    if (cal == 0) {
+        s_esp_rtc_time_us = 0;
+        s_rtc_last_ticks = 0;
+    }
     const uint64_t rtc_this_ticks = rtc_time_get();
     const uint64_t ticks = rtc_this_ticks - s_rtc_last_ticks;
     /* RTC counter result is up to 2^48, calibration factor is up to 2^24,

kriegste · 2022-10-27T18:50:33Z

Thanks. I am already testing on two devices. Tomorrow I'll post results.

AxelLin · 2022-10-28T00:29:44Z

@KonstantinKondrashov
Can you also take a look at this issue? #6687

kriegste · 2022-10-28T11:25:22Z

Looks very good. No problems so far.

Closes espressif#9448

KonstantinKondrashov · 2022-11-07T14:15:21Z

The backports are ready and will be merged soon.

Closes #9448

someburner · 2022-11-13T18:58:44Z

@KonstantinKondrashov - release/v5.0 too?

KonstantinKondrashov · 2022-11-14T15:28:37Z

Hi @someburner! Sure. The v5.0 backport is ready and will be merged soon. Thanks.

KonstantinKondrashov · 2022-12-02T10:11:46Z

Sorry, our internal release/v5.0 branch is still locked. This fix will be merged when possible.

Closes #9448

natsco-sbottoms · 2023-02-28T14:14:51Z

Hello @kriegste and @KonstantinKondrashov ! Thanks for your work on this. I hope it's ok to resurrect this in the initial issue -- I wanted to alert you to what I believe may be an issue regarding this change in the context of OTA firmware updates.

The issue we're seeing is that changing s_esp_rtc_time_us and s_rtc_last_ticks to be RTC_NOINIT_ATTR and initializing them in esp_rtc_get_time_us() only works if the slow clock calibration register is zero. That is fine on power-up, but I believe that after an OTA update that register will still be set from before the update. Consequently, s_esp_rtc_time_us and s_rtc_last_ticks will not be initialized, and if variables in RTC RAM have been moved around by the linker, they will be filled with garbage data. Any reset other than OTA would work fine because the variables would still be initialized from the initial bootup.

I think we saw this on our firmware, and it resulted in incorrect values being reported by esp_rtc_get_time_us(). I'd be curious if you agree with this observation or if I'm off-base somehow.

Ideally during an OTA our system clock could stay valid, but at the least I'd like to avoid the undefined behavior we're currently experiencing.

boribosnjak · 2023-03-10T08:43:26Z

Hello @kriegste and @KonstantinKondrashov ! Thanks for your work on this. I hope it's ok to resurrect this in the initial issue -- I wanted to alert you to what I believe may be an issue regarding this change in the context of OTA firmware updates.

The issue we're seeing is that changing s_esp_rtc_time_us and s_rtc_last_ticks to be RTC_NOINIT_ATTR and initializing them in esp_rtc_get_time_us() only works if the slow clock calibration register is zero. That is fine on power-up, but I believe that after an OTA update that register will still be set from before the update. Consequently, s_esp_rtc_time_us and s_rtc_last_ticks will not be initialized, and if variables in RTC RAM have been moved around by the linker, they will be filled with garbage data. Any reset other than OTA would work fine because the variables would still be initialized from the initial bootup.

I think we saw this on our firmware, and it resulted in incorrect values being reported by esp_rtc_get_time_us(). I'd be curious if you agree with this observation or if I'm off-base somehow.

Ideally during an OTA our system clock could stay valid, but at the least I'd like to avoid the undefined behavior we're currently experiencing.

We see exactly the same problem. @kriegste and @KonstantinKondrashov from our point of view the change fixes the issue discussed in this thread. Unfortunately it created an issue after a firmware update, when addresses of the variables have changed.

KonstantinKondrashov · 2023-03-10T09:25:01Z

@kriegste @boribosnjak,
Sorry for the flaw, yes, it is possible. I will try to find a way to fix it.
I reopen this issue.

natsco-sbottoms · 2023-03-10T12:30:10Z

For what it's worth, I was able to add a shutdown handler with esp_register_shutdown_handler()

All this handler does is zero out the clock calibration:
esp_clk_slowclk_cal_set(0);

That, by itself, seems to be enough to preserve the clock and avoid this new bug on our configuration (ESP32-C3 with 32k XTAL). I'm not sure if it would be effective in all cases. There would be a loss of precision during a reset, but for our application we don't really care, and we only need to do this if it is a OTA update.

boribosnjak · 2023-03-10T12:56:14Z

@natsco-sbottoms Thank you for the hint. Sounds like a reasonable solution to me.
At the moment I patched the esp-idf v4.4.4 changing the variables back to RTC_DATA_ATTR and initialising it to 0.
I will consider adding your solution into our reboot actions after OTA. Thanks!

KonstantinKondrashov · 2023-03-31T17:59:23Z

Hi!
The fix is on review now - fix.txt. Soon will be merged.
It places the s_esp_rtc_time_us and s_rtc_last_ticks vars at a certain place in the .rtc_noinit memory that will not be used by any other things.

AxelLin · 2023-04-10T04:42:19Z

Hi! The fix is on review now - fix.txt. Soon will be merged.

Thanks for the update @KonstantinKondrashov
Hope to get the fix for all impacted branches (v4.3+).

AxelLin · 2023-04-15T01:40:38Z

Hi! The fix is on review now - fix.txt. Soon will be merged.

@KonstantinKondrashov
Any chance to raise the priority for applying this fix?
It has impacted on multiple stable branches but I still cannot find the fix for this regression in any branches.

AxelLin · 2023-05-18T02:32:21Z

Hi! The fix is on review now - fix.txt. Soon will be merged.

@KonstantinKondrashov
I'm not sure what is the reasonable time for reviewing a fix in espressif.
But the review time is long enough that people cannot differentiate if it is lost or something else.
I feel there must be something worong here.
In the meantime, all stable branches have this issue!

AxelLin · 2023-06-09T10:56:24Z

Hi! The fix is on review now - fix.txt. Soon will be merged.

@KonstantinKondrashov I'm not sure what is the reasonable time for reviewing a fix in espressif. But the review time is long enough that people cannot differentiate if it is lost or something else. I feel there must be something worong here. In the meantime, all stable branches have this issue!

@igrr @Alvin1Zhang
Can someone help to take a look at the fix #9448 (comment) ?
It was mentiond as in review for more than 2 months and then no response.

KonstantinKondrashov · 2023-06-09T11:18:09Z

Hi @AxelLin! Sorry for the delay. It was on review for a long time, now it is merged and soon will appear here (backports as well).

AxelLin · 2023-07-04T06:53:59Z

Hi @AxelLin! Sorry for the delay. It was on review for a long time, now it is merged and soon will appear here (backports as well).

The fix is not available in any release branch so far.

…last_ticks were moved around The commit fixes the case: If variables in RTC RAM have been moved around by the linker, they will be filled with garbage data. Any reset other than OTA would work fine because the variables would still be initialized from the initial bootup. So now system time will be valid even after OTA. Closes #9448

AxelLin · 2023-07-21T10:02:56Z

@KonstantinKondrashov
I still cannot find the fix in 5.0, 4.4, 4.3 branches. No idea why it takes such long time to get fix.

KonstantinKondrashov · 2023-07-21T10:21:33Z

Hi! v5.0 is not merged yet but it is ready to go. For the rest (4.4, 4.3) branches, the backport will not be provided according to backport policy.

boribosnjak · 2023-07-21T10:28:22Z

thank you for the update. that is no good news. I was hoping to see the fix in upcoming 4.4.6. In that case we will prepare an update to v5.1

AxelLin · 2023-07-21T11:05:17Z

For the rest (4.4, 4.3) branches, the backport will not be provided according to backport policy.

This is a BUGFIX. no idea why your policy does not support bugfix in maintenance period.
I use esp-idf for several years, my experience is the releases in service period are not really stable.
Once a release reach more stable state it's almost in maintenance period, however it's difficult to get bug fix for releases in maintenance period.

BTW, the regression was reported on Feb 28 (#9448 (comment)) , (I have been waiting the fix for v4.3 for 3 months #9448 (comment))

…last_ticks were moved around The commit fixes the case: If variables in RTC RAM have been moved around by the linker, they will be filled with garbage data. Any reset other than OTA would work fine because the variables would still be initialized from the initial bootup. So now system time will be valid even after OTA. Closes #9448

AxelLin · 2023-09-04T02:35:29Z

For the rest (4.4, 4.3) branches, the backport will not be provided according to backport policy.

Please add this to known issue of upcoming v4.3.x and v4.4.x release notes.

espressif-bot added the Status: Opened Issue is new label Jul 27, 2022

github-actions bot changed the title ~~System time corrupted across reboots~~ System time corrupted across reboots (IDFGH-7930) Jul 27, 2022

espressif-bot assigned KonstantinKondrashov Jul 28, 2022

espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Aug 3, 2022

espressif-bot added Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: In Progress Work is in progress Resolution: NA Issue resolution is unavailable labels Nov 2, 2022

espressif-bot closed this as completed in ff8b5bc Nov 2, 2022

chipweinberger pushed a commit to chipweinberger/esp-idf that referenced this issue Nov 2, 2022

esp_hw_support: Fix time jump after reboot

485fafe

Closes espressif#9448

espressif-bot pushed a commit that referenced this issue Nov 8, 2022

esp_hw_support: Fix time jump after reboot

2c3f303

Closes #9448

espressif-bot pushed a commit that referenced this issue Dec 4, 2022

esp_hw_support: Fix time jump after reboot

61d05c0

Closes #9448

sanderkob mentioned this issue Feb 28, 2023

FIx local time in logs arjenhiemstra/ithowifi#160

Closed

KonstantinKondrashov reopened this Mar 10, 2023

espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: Done Issue is done internally labels Apr 24, 2023

espressif-bot closed this as completed in cb6e213 Jun 9, 2023

espressif-bot added Status: Done Issue is done internally and removed Status: Reviewing Issue is being reviewed labels Jul 4, 2023

System time corrupted across reboots (IDFGH-7930) #9448

System time corrupted across reboots (IDFGH-7930) #9448

Comments

kriegste commented Jul 27, 2022

Environment

Problem Description

Expected Behavior

Code to reproduce this issue

Debug Logs

someburner commented Jul 28, 2022

kriegste commented Jul 29, 2022

kriegste commented Jul 29, 2022

kriegste commented Jul 31, 2022

KonstantinKondrashov commented Aug 3, 2022

someburner commented Aug 19, 2022

kriegste commented Sep 18, 2022

kriegste commented Oct 22, 2022

KonstantinKondrashov commented Oct 23, 2022

KonstantinKondrashov commented Oct 26, 2022

kriegste commented Oct 26, 2022

KonstantinKondrashov commented Oct 27, 2022

KonstantinKondrashov commented Oct 27, 2022

kriegste commented Oct 27, 2022

AxelLin commented Oct 28, 2022

kriegste commented Oct 28, 2022

KonstantinKondrashov commented Nov 7, 2022

someburner commented Nov 13, 2022

KonstantinKondrashov commented Nov 14, 2022

KonstantinKondrashov commented Dec 2, 2022

natsco-sbottoms commented Feb 28, 2023 • edited Loading

boribosnjak commented Mar 10, 2023

KonstantinKondrashov commented Mar 10, 2023

natsco-sbottoms commented Mar 10, 2023

boribosnjak commented Mar 10, 2023

KonstantinKondrashov commented Mar 31, 2023

AxelLin commented Apr 10, 2023

AxelLin commented Apr 15, 2023

AxelLin commented May 18, 2023

AxelLin commented Jun 9, 2023

KonstantinKondrashov commented Jun 9, 2023

AxelLin commented Jul 4, 2023

AxelLin commented Jul 21, 2023

KonstantinKondrashov commented Jul 21, 2023

boribosnjak commented Jul 21, 2023

AxelLin commented Jul 21, 2023 • edited Loading

AxelLin commented Sep 4, 2023

natsco-sbottoms commented Feb 28, 2023 •

edited

Loading

AxelLin commented Jul 21, 2023 •

edited

Loading