Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't take semaphore on ESP32-S2 #97

Open
whc2001 opened this issue Aug 4, 2024 · 10 comments
Open

Can't take semaphore on ESP32-S2 #97

whc2001 opened this issue Aug 4, 2024 · 10 comments

Comments

@whc2001
Copy link

whc2001 commented Aug 4, 2024

As mentioned in my last issue, I decided to go back to WiFi connection. I found that the Wemos ESP32-S2 mini really suits my need, with a small footprint, type-C port and two screw holes. I paired the board with a MAX98357A to test it out.

Firstly I have to specify the chip using idf.py set-target esp32s2. After that I configured everything including PSRAM (the S2 chip contains 4MB of flash and 2MB of PSRAM). Here is the full configuration:

sdkconfig.txt

When compiling, it compains in boards_pins_config.c that I2S_NUM_1 cannot be found. I have to make the following change for it to work:

image

After compiling and flashing, everything seems fine. The connection is successful and the audio is playing through the client. However, after a random amount of time (no longer than 10 minutes), the client would crash and the current audio frame will be played indefinitely. If I check the serial log output, I can see the following being printed out constantly:

�[0;32mI (334122) dspProc: Set volume to 0.030000�[0m
�[0;33mW (366932) PLAYER: RESYNCING HARD 2: age -99us, latency 947715411563us, free 2106647, largest block 2031616, 0, rssi: -50�[0m
�[0;32mI (367092) PLAYER: initial sync age: 12us, chunk duration: 20000us�[0m
�[0;33mW (531542) PLAYER: RESYNCING HARD 2: age -109us, latency 947715405177us, free 2106627, largest block 2031616, 0, rssi: -52�[0m
�[0;32mI (531702) PLAYER: initial sync age: 10us, chunk duration: 20000us�[0m
�[0;33mW (604722) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604722) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604732) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604732) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604742) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604752) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604762) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604762) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604772) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604782) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604782) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604792) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604802) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604812) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604812) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604822) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604832) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604842) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604842) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604852) PLAYER: latency_buffer_full: can't take semaphore�[0m
�[0;33mW (604862) PLAYER: latency_buffer_full: can't take semaphore�[0m

I have ruled out the power supply issue of the board (applied 3.3V directly through the pin, bypassing the onboard LDO). I have tried to increase and decrease chunk_ms and buffer in server configuration, but seems like it does not have a significant influence on the issue. I have also fiddled around the settings about PSRAM, also no avail (however if I turn off PSRAM, audio board would fail to initialize due to out of memory, so I assume it's working). I also added log to the return value of xSemaphoreGive(latencyBufSemaphoreHandle) to see if anything does not release the semaphore, but they returns true every time. At this point I am really not sure what is causing the semaphore to stuck.

@CarlosDerSeher
Copy link
Owner

latency_buffer_full() causing an infinite loop for some reason? Check all places where it is called maybe?

@CarlosDerSeher
Copy link
Owner

In my new issue I tested different server configuration for many times, after systemctl restart snapserver the client would stop for several seconds and start playing again (or die due to the semaphore issue).

Does this error only happen after server restart?

@whc2001
Copy link
Author

whc2001 commented Aug 4, 2024

In my new issue I tested different server configuration for many times, after systemctl restart snapserver the client would stop for several seconds and start playing again (or die due to the semaphore issue).

Does this error only happen after server restart?

Nope it's not decisive. It happens randomly and I can't find a pattern so far.

@whc2001
Copy link
Author

whc2001 commented Aug 4, 2024

latency_buffer_full() causing an infinite loop for some reason? Check all places where it is called maybe?

The only place this function is called is in insert_pcm_chunk and the main loop player_task. I'll try to print some log to find where the problematic call is originated. The only thing it does is to check a global variable in a thread safe manner. I am really not sure why the lock cannot be obtained.

@whc2001
Copy link
Author

whc2001 commented Aug 4, 2024

I have added some trace to the function, here is what it looks like:

int32_t latency_buffer_full(bool *is_full, TickType_t wait, const char *entrance) {
  if (!is_full) {
    return -3;
  }

  if (latencyBufSemaphoreHandle == NULL) {
    ESP_LOGE(TAG, "%s -> latency_buffer_full: latencyBufSemaphoreHandle == NULL", entrance);

    return -2;
  }

  if (xSemaphoreTake(latencyBufSemaphoreHandle, wait) == pdFALSE) {
    ESP_LOGW(TAG, "%s -> latency_buffer_full: can't take semaphore in %lu ticks", entrance, wait);

    return -1;
  }

  *is_full = latencyBuffFull;

  if(xSemaphoreGive(latencyBufSemaphoreHandle) == pdFALSE) {
    ESP_LOGE(TAG, "%s -> latency_buffer_full: semaphore release FAILED", entrance);

    return -4;
  }

  return 0;
}

And here is the crash:

�[0;33mW (200628) PLAYER: RESYNCING HARD 2: age -100385us, latency 125817242173us, free 1947607, largest block 1900544, 41, rssi: -34�[0m
�[0;32mI (200788) PLAYER: initial sync age: 22us, chunk duration: 20000us�[0m
�[0;33mW (365228) PLAYER: RESYNCING HARD 2: age -100618us, latency 125817241083us, free 1947623, largest block 1900544, 41, rssi: -34�[0m
�[0;32mI (365388) PLAYER: initial sync age: 22us, chunk duration: 20000us�[0m
�[0;33mW (426568) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426568) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426578) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426578) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426588) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426598) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426608) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426618) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426628) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426638) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426638) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m
�[0;33mW (426648) PLAYER: player_task -> latency_buffer_full: can't take semaphore in 0 ticks�[0m

The strange thing is that the wait parameter is 0 although the function call is pdMS_TO_TICKS(1), but not sure if this is related to the problem.

@whc2001
Copy link
Author

whc2001 commented Aug 4, 2024

Okay I did more fiddling and found that pdMS_TO_TICKS uses portTICK_PERIOD_MS, which is 10 in my instance. So the pdMS_TO_TICKS(1) in player_task is indeed 0. I tried change it to pdMS_TO_TICKS(10) which gives one tick to wait for the lock, then the problem seems gone (still need more testing). Not really sure where the real problem is, but I do smell a potential deadlock.

@CarlosDerSeher
Copy link
Owner

CONFIG_FREERTOS_HZ=100

I set this to 1000 in all sdkconfigs so this is probably your issue?

@whc2001
Copy link
Author

whc2001 commented Aug 4, 2024

CONFIG_FREERTOS_HZ=100

I set this to 1000 in all sdkconfigs so this is probably your issue?

Ahhhhh I think that might be the root of all evil. I have to set-target for S2 which will wipe all my sdkconfig, and I am not sure how to migrate the config so I just configured the common fields by hand. Let me revert the change and correct this. In such case 1 ms should equals 1 tick and the problem should go away.

@CarlosDerSeher
Copy link
Owner

When you have a working config would you PR this as a template?

@whc2001
Copy link
Author

whc2001 commented Aug 5, 2024

When you have a working config would you PR this as a template?

I made several changes of the FreeRTOS part to keep the same as the provided config. I let it play for around three hours, then muted and went to bed. This morning I unmuted and it immediately resumes playing, tested for another several hours without problem, all while occationally switching audio source. Here is the current configuration I am using:

sdkconfig.txt

I have changed several PSRAM configurations that might affect stability. It may need further testing but I think at a glance it works without problem. On server I am using PCM/20ms/400ms, as FLAC seems to cause rare crackles or resyncs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants