Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer for DynamicJasonDocument livedata/status and prometheus/metrics too big? #739

Closed
helgeerbe opened this issue Mar 24, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@helgeerbe
Copy link
Contributor

What happened?

I use liveview and prometheus very frequently and sometimes liveview can't update and the page goes blank due to bad_alloc error. If things goes worse, the system never recovers and I have to reboot to see livedata again.
I run my own OpenDTU with some extensions, so this might be an edge case? Don't know.

But I observe, that for both api calls from my point of view a far too big buffer is requested.

DynamicJsonDocument root(40960);

I don't know the max possible size for a response, but for my setup 4096 bytes for the livedata will be enough. So OpenDTU requested a 10 times bigger buffer.

I reduced the buffer to 20480, and since then I can't observer any bad_alloc error anymore.

To Reproduce Bug

Run liveview and prometheus in parallel. Wait on console for the out of resources output.

Expected Behavior

No bad_alloc errors due to memory fragmentation.

Install Method

Self-Compiled

What git-hash/version of OpenDTU?

3f8226c

Relevant log/trace output

No response

Anything else?

No response

@helgeerbe helgeerbe added the bug Something isn't working label Mar 24, 2023
@tbnobody
Copy link
Owner

have you tested it with 10 inverters and all descriptions filled to it's max size?

@helgeerbe
Copy link
Contributor Author

have you tested it with 10 inverters and all descriptions filled to it's max size?

No I don't. Do you have an example json, which uses the maximum size in settings? Or can you tell me, what the maximum size for liveview and prometheus output is? So I could copy it into one test OpenDTU

@tbnobody
Copy link
Owner

tbnobody commented Mar 24, 2023

You can copy the output of one inverter into the assistant and look at the memory consumption: https://arduinojson.org/v6/assistant/#/step1

Every description field can be 31 characters long. that means 10 inverters * 31 chars + 4 channels / inverter * 31 characters = is already 1550 b only for texts... then you have the values itself and the overhead for the json output itself.

@helgeerbe
Copy link
Contributor Author

I see. I tested it on your latest originalOpenDTU release from master. With 10 HM-600, so only 2 channels. Calculator says 24576bytes.
But with these settings, the openDTU becomes unusable.

Starts up fine

12:27:53.785 > Starting OpenDTU
12:27:53.785 > Initialize FS... done
12:27:53.785 > Reading configuration... done
12:27:53.911 > Reading PinMapping... [   159][E][vfs_api.cpp:105] open(): /littlefs/pin_mapping.json does not exist, no permits for creation
12:27:53.920 > using default config done
12:27:53.920 > Initialize Network... done
12:27:54.054 > Setting Hostname... Configuring WiFi STA using new credentials... done
12:27:54.064 > Initialize NTP... done
12:27:54.064 > Initialize SunPosition... done
12:27:54.064 > Initialize MqTT... done
12:27:54.068 > Initialize WebApi... done
12:27:55.056 > Initialize Display... done
12:27:55.056 > Check for default DTU serial... done
12:27:55.059 > Initialize Hoymiles interface... Connection error!!
12:27:55.072 >   Setting radio PA level... 
12:27:55.072 >   Setting DTU serial... 
12:27:55.078 >   Setting poll interval... 
12:27:55.078 >   Adding inverter: 114181019620 - asdfasdfasdfasdfasdfasdfasdfadf done
12:27:55.082 >   Adding inverter: 112182857016 - Lunaadfasdfasdfasdfasdfadsfasdf done
12:27:55.088 >   Adding inverter: 114181019609 - asdfasdfasdfasdfasdfasdfasdfasd done
12:27:55.099 >   Adding inverter: 114181019610 - asdfasdfgdsfbnsdfghrtezn etrzne done
12:27:55.104 >   Adding inverter: 114181019611 - adfasdfasdfasdfgasdfasdfasdfasd done
12:27:55.110 >   Adding inverter: 114181019612 - adfasdfasdfasdfasdfasdfasdfasdf done
12:27:55.115 >   Adding inverter: 114181019613 - qerqawetqrzhretujeztirzutiztruo done
12:27:55.121 >   Adding inverter: 114181019614 - adfasdfasdfadsfasdfasdfasdfasdf done
12:27:55.126 >   Adding inverter: 114181019615 - adfadsfasdfasdfasdfasdfasdfasdf done
12:27:55.132 >   Adding inverter: 114181019616 - adfasdfasdfasdfasdfasdfasdfasdf done
12:27:55.137 > done

Without using prometheus I see in live view:

12:30:43.224 > Call to /api/livedata/status temporarely out of resources. Reason: "std::bad_alloc".

Sometimes I see a crash in the network loop:

2:32:21.033 > abort() was called at PC 0x40192083 on core 1
12:32:21.033 > 
12:32:21.033 > 
12:32:21.033 > Backtrace: 0x40083919:0x3ffb25f0 0x4008e0cd:0x3ffb2610 0x400939cd:0x3ffb2630 0x40192083:0x3ffb26b0 0x401920ca:0x3ffb26d0 0x40191dab:0x3ffb26f0 0x40191ed2:0x3ffb2710 0x40191789:0x3ffb2730 0x40196b41:0x3ffb2750 0x40106405:0x3ffb27a0 0x400dbf87:0x3ffb27e0 0x400f1cea:0x3ffb2800 0x4010a85d:0x3ffb2820
12:32:21.393 > 
12:32:21.393 >   #0  0x40083919:0x3ffb25f0 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:402
12:32:21.393 >   #1  0x4008e0cd:0x3ffb2610 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:128
12:32:21.393 >   #2  0x400939cd:0x3ffb2630 in abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/abort.c:46
12:32:21.393 >   #3  0x40192083:0x3ffb26b0 in __cxxabiv1::__terminate(void (*)()) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
12:32:21.393 >   #4  0x401920ca:0x3ffb26d0 in std::terminate() at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
12:32:21.393 >   #5  0x40191dab:0x3ffb26f0 in __cxa_throw at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
12:32:21.393 >   #6  0x40191ed2:0x3ffb2710 in operator new(unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54
12:32:21.393 >   #7  0x40191789:0x3ffb2730 in operator new[](unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_opv.cc:32
12:32:21.393 >   #8  0x40196b41:0x3ffb2750 in WiFiUDP::parsePacket() at /Users/herbe/.platformio/packages/framework-arduinoespressif32/libraries/WiFi/src/WiFiUdp.cpp:210
12:32:21.393 >   #9  0x40106405:0x3ffb27a0 in DNSServer::processNextRequest() at /Users/herbe/.platformio/packages/framework-arduinoespressif32/libraries/DNSServer/src/DNSServer.cpp:78
12:32:21.393 >   #10 0x400dbf87:0x3ffb27e0 in NetworkSettingsClass::loop() at src/NetworkSettings.cpp:213
12:32:21.393 >   #11 0x400f1cea:0x3ffb2800 in loop() at src/main.cpp:135
12:32:21.393 >   #12 0x4010a85d:0x3ffb2820 in loopTask(void*) at /Users/herbe/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp:50
12:32:21.393 > 
12:32:21.393 > 
12:32:21.393 > 
12:32:21.393 > 
12:32:21.393 > ELF file SHA256: 08d2e8b9f14164cc
12:32:21.393 > 
12:32:21.393 > E (15444) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
12:32:21.393 > Rebooting...
12:32:21.393 > ets Jun  8 2016 00:22:57
12:32:21.393 > 
12:32:21.393 > rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
12:32:21.393 > configsip: 0, SPIWP:0xee
12:32:21.393 > clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
12:32:21.393 > mode:DIO, clock div:2
12:32:21.393 > load:0x3fff0030,len:1184
12:32:21.393 > load:0x40078000,len:13104
12:32:21.393 > load:0x40080400,len:3036
12:32:21.393 > entry 0x400805e4
12:32:21.683 > E (605) esp_core_dump_flash: No core dump�
12:32:21.683 > �ѥѥ���found!
12:32:21.683 > E (605) esp_core_dump_flash: No core dump partition found!

I added:

MessageOutput.printf("MaxAllocHeap: \"%d\".\r\n", ESP.getMaxAllocHeap());
 DynamicJsonDocument root(40960);

Documentation says that you can use ESP.getMaxAllocHeap() and got:

12:49:42.726 > MaxAllocHeap: "49140".
12:49:42.974 > Call to /api/livedata/status temporarely out of resources. Reason: "std::bad_alloc".

I checked the code you alloc theDynamicJsonDocument, serialize Jason, and do all the network things. It might be better to free the DynomicJasonDocument directly after serialization.

try {
            String buffer;

            {
                MessageOutput.printf("MaxAllocHeap: \"%d\".\r\n", ESP.getMaxAllocHeap());
                DynamicJsonDocument root(40960);
                JsonVariant var = root;
                generateJsonResponse(var);
                serializeJson(root, buffer);
            }
            MessageOutput.printf("MaxAllocHeap after freeing JsonDocument: \"%d\".\r\n", ESP.getMaxAllocHeap());
           
            if (buffer) {
                

                if (Configuration.get().Security_AllowReadonly) {
                    _ws.setAuthentication("", "");
                } else {
                    _ws.setAuthentication(AUTH_USERNAME, Configuration.get().Security_Password);
                }

                _ws.textAll(buffer);
            }

        } catch (std::bad_alloc& bad_alloc) {
            MessageOutput.printf("Call to /api/livedata/status temporarely out of resources. Reason: \"%s\".\r\n", bad_alloc.what());
        }

        _lastWsPublish = millis();
    }
}
13:51:36.192 > MaxAllocHeap: "41972".
13:51:36.299 > MaxAllocHeap after freeing JsonDocument: "41972".

And I don't see any bad_alloc notification in the last couple of minutes.

But I still received the above mentioned crash a couple of times.

14:35:18.151 > 
14:35:18.151 > Backtrace: 0x40083919:0x3ffb25f0 0x4008e0cd:0x3ffb2610 0x400939cd:0x3ffb2630 0x401920d7:0x3ffb26b0 0x4019211e:0x3ffb26d0 0x40191dff:0x3ffb26f0 0x40191f26:0x3ffb2710 0x401917dd:0x3ffb2730 0x40196b95:0x3ffb2750 0x40106459:0x3ffb27a0 0x400dbf93:0x3ffb27e0 0x400f1d3a:0x3ffb2800 0x4010a8b1:0x3ffb2820
14:35:18.542 > 
14:35:18.542 >   #0  0x40083919:0x3ffb25f0 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:402
14:35:18.542 >   #1  0x4008e0cd:0x3ffb2610 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:128
14:35:18.542 >   #2  0x400939cd:0x3ffb2630 in abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/abort.c:46
14:35:18.542 >   #3  0x401920d7:0x3ffb26b0 in __cxxabiv1::__terminate(void (*)()) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
14:35:18.542 >   #4  0x4019211e:0x3ffb26d0 in std::terminate() at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
14:35:18.542 >   #5  0x40191dff:0x3ffb26f0 in __cxa_throw at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
14:35:18.542 >   #6  0x40191f26:0x3ffb2710 in operator new(unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54
14:35:18.542 >   #7  0x401917dd:0x3ffb2730 in operator new[](unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_opv.cc:32
14:35:18.542 >   #8  0x40196b95:0x3ffb2750 in WiFiUDP::parsePacket() at /Users/herbe/.platformio/packages/framework-arduinoespressif32/libraries/WiFi/src/WiFiUdp.cpp:210
14:35:18.542 >   #9  0x40106459:0x3ffb27a0 in DNSServer::processNextRequest() at /Users/herbe/.platformio/packages/framework-arduinoespressif32/libraries/DNSServer/src/DNSServer.cpp:78
14:35:18.542 >   #10 0x400dbf93:0x3ffb27e0 in NetworkSettingsClass::loop() at src/NetworkSettings.cpp:213
14:35:18.542 >   #11 0x400f1d3a:0x3ffb2800 in loop() at src/main.cpp:135
14:35:18.542 >   #12 0x4010a8b1:0x3ffb2820 in loopTask(void*) at /Users/herbe/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp:50

Tracking down to this code. Exception for new is not handled correctly.

int WiFiUDP::parsePacket(){
  if(rx_buffer)
    return 0;
  struct sockaddr_in si_other;
  int slen = sizeof(si_other) , len;
  char * buf = new char[1460];
  if(!buf){
    return 0;
  }

For testing I changed it to:

nt WiFiUDP::parsePacket(){
  if(rx_buffer)
    return 0;
  struct sockaddr_in si_other;
  int slen = sizeof(si_other) , len;
  char * buf = 0;
  
  try{
    buf = new char[1460];
  } catch (std::bad_alloc& bad_alloc) {
    return 0;
  }
  if(!buf){
    return 0;
  }

Doing this hack, I don't see any memory issues any more.
Long post, but 10 inverters is near the edge what could be handled.

@tbnobody
Copy link
Owner

I am currently doing some tests here as well.... I also added 10 inverters. I am calling the prometheus endpoint every 5 seconds. The LiveView is running in two browser windows.

The Memory Info shows the following info:
image

With only 2 inverters the memory usage is at about 125kb. How long are you running this setup until you receive such errors?

@helgeerbe
Copy link
Contributor Author

With the latest merge of 23.5.22 into OpenDTU-onBattery no liveview was possible anymore due to memory fragmentation. As a workaround I set #define INV_MAX_COUNT 5 and calculate the JsonDocument as DynamicJsonDocument root(4096 * INV_MAX_COUNT); // TODO helge: check if this calculation is correct

@tbnobody
Copy link
Owner

Would close this issue as it is related to very old code now. the structure of the json output was also changed

@tbnobody tbnobody closed this as not planned Won't fix, can't repro, duplicate, stale Mar 21, 2024
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants