Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP receiver async - panic during shutdown (async mode only) #29120

Closed
hovavza opened this issue Nov 12, 2023 · 1 comment · Fixed by #29121
Closed

UDP receiver async - panic during shutdown (async mode only) #29120

hovavza opened this issue Nov 12, 2023 · 1 comment · Fixed by #29121
Labels
bug Something isn't working needs triage New item requiring triage pkg/stanza receiver/udplog

Comments

@hovavza
Copy link
Contributor

hovavza commented Nov 12, 2023

Component(s)

pkg/stanza, receiver/udplog

What happened?

Description

When udp-receiver is configured to run in async mode (and only then), when stopping or restarting, panic happens due to trying to write to closed channel.
When stop is called, it closes the messageQueue channel, signaling to processMessagesAsync to stop running. However, readMessagesAsync sometimes tries to write into the closed channel (depends whether the method is currently reading from the closed connection or currently trying to write to the channel), and as a result, panic error happens.
readMessagesAsync needs to finish before closing the channel.

Steps to Reproduce

Restart strato once or twice

Expected Result

No panic should happen. udp receiver should stop gracefully.

Actual Result

Panic happens

Collector version

0.88.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@hovavza hovavza added bug Something isn't working needs triage New item requiring triage labels Nov 12, 2023
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski pushed a commit that referenced this issue Nov 16, 2023
)

**Description:** 
Link to tracking Issue: Fixes #29120
Fixing a bug - panic happens during stop method in async mode only
(didn't affect the default non-async mode).
When stop is called, it closes the messageQueue channel, signaling to
processMessagesAsync to stop running. However, readMessagesAsync
sometimes tries to write into the closed channel (depends whether the
method is currently reading from the closed connection or currently
trying to write to the channel), and as a result, panic error happens.

Separated between wg (waitGroup that serves non-async code and
processMessagesAsync) & the new wg_reader (waitGroup serving
readMessagesAsync only). This allows us to first stop readMessagesAsync,
wait for it to finish, before closing the channel.
Instead, stop (in async mode) should do the following:
1. Close the connection - signaling readMessagesAsync to stop - the
messageQueue channel will remain open until that method is done so
there's no risk of panic (due to writing to a closed channel).
2. Wait for readMessagesAsync to finish (wait for new wg_reader).
3. Close messageQueue channel (signaling processMessagesAsync to stop)
4. Wait for processMessagesAsync to finish (wait sg).

**Link to tracking Issue:** 29120

**Testing:** Unitests ran. Ran concrete strato, stopped & restarted
multiple times, didn't see any panic (and stop completed successfully as
expected)

**Documentation:** None.
RoryCrispin pushed a commit to ClickHouse/opentelemetry-collector-contrib that referenced this issue Nov 24, 2023
…n-telemetry#29121)

**Description:** 
Link to tracking Issue: Fixes open-telemetry#29120
Fixing a bug - panic happens during stop method in async mode only
(didn't affect the default non-async mode).
When stop is called, it closes the messageQueue channel, signaling to
processMessagesAsync to stop running. However, readMessagesAsync
sometimes tries to write into the closed channel (depends whether the
method is currently reading from the closed connection or currently
trying to write to the channel), and as a result, panic error happens.

Separated between wg (waitGroup that serves non-async code and
processMessagesAsync) & the new wg_reader (waitGroup serving
readMessagesAsync only). This allows us to first stop readMessagesAsync,
wait for it to finish, before closing the channel.
Instead, stop (in async mode) should do the following:
1. Close the connection - signaling readMessagesAsync to stop - the
messageQueue channel will remain open until that method is done so
there's no risk of panic (due to writing to a closed channel).
2. Wait for readMessagesAsync to finish (wait for new wg_reader).
3. Close messageQueue channel (signaling processMessagesAsync to stop)
4. Wait for processMessagesAsync to finish (wait sg).

**Link to tracking Issue:** 29120

**Testing:** Unitests ran. Ran concrete strato, stopped & restarted
multiple times, didn't see any panic (and stop completed successfully as
expected)

**Documentation:** None.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage pkg/stanza receiver/udplog
Projects
None yet
1 participant