Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix leaking goroutine in memorylimiter #9100

Merged
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion processor/memorylimiterprocessor/memorylimiter.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@ type memoryLimiter struct {

refCounterLock sync.Mutex
refCounter int
waitGroup sync.WaitGroup
closeOnce sync.Once
closed chan struct{}
}

// Minimum interval between forced GC when in soft limited mode. We don't want to
Expand Down Expand Up @@ -98,6 +101,7 @@ func newMemoryLimiter(set processor.CreateSettings, cfg *Config) (*memoryLimiter
logger: logger,
mustRefuse: &atomic.Bool{},
obsrep: obsrep,
closed: make(chan struct{}),
}

return ml, nil
Expand Down Expand Up @@ -141,6 +145,10 @@ func (ml *memoryLimiter) shutdown(context.Context) error {
return errShutdownNotStarted
} else if ml.refCounter == 1 {
ml.ticker.Stop()
ml.closeOnce.Do(func() {
close(ml.closed)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this, we guarantee to close only once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this

func TestCreateProcessor(t *testing.T) {
that test fails

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that this test doesn't make sense. I would like @open-telemetry/collector-approvers to say their opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the second Shutdown call from the test. We don't have guarantees that it's allowed.

Copy link
Member

@dmitryax dmitryax Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's Start->Shutdown->Start-> causing the failure... This change breaks that workflow even without ml.closeOnce. I don't think we have other components supporting this workflow. So should be fine to not test it.

Copy link
Member

@dmitryax dmitryax Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe let's move closed and closeOnce initialization to Start method, the behavior will stay the same

We can move closed channel creation to Start. closeOnce will not be needed anymore.

@dmitryax is calling a processor's Start after calling Shutdown an unsupported action? Would OPAMP ever need to restart a shutdown component?

I'm not sure

If that pattern is not allowed I agree that the test is invalid and we can treat the code in shutdown as only ever being invoked once.

I'm not saying it's not allowed. We just don't explicitly support it. Likely some components will be failing.

Probably better not not break this pattern here with this change and just move closed channel creation to Start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would OPAMP ever need to restart a shutdown component?

If we ever want a hot restart we will add a "Reload" capability. Once the component is shutdown, I see no reason to re-start it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I moved closed channel to Start and removed closeOnce

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to startMonitoring, so the closed channel is created when the goroutine is created. When there are metrics, logs and traces in the pipeline it would be called 3 times and closed would be created 3 times too resulting in data race

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are metrics, logs and traces in the pipeline it would be called 3 times and closed would be created 3 times too resulting in data race

Why? They are synchronized via refCounterLock

ml.waitGroup.Wait()
}
ml.refCounter--
return nil
Expand Down Expand Up @@ -226,8 +234,16 @@ func (ml *memoryLimiter) startMonitoring() {

ml.refCounter++
if ml.refCounter == 1 {
ml.waitGroup.Add(1)
go func() {
for range ml.ticker.C {
defer ml.waitGroup.Done()

for {
select {
case <-ml.ticker.C:
case <-ml.closed:
dmitryax marked this conversation as resolved.
Show resolved Hide resolved
return
}
ml.checkMemLimits()
}
}()
Expand Down
Loading