Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] - resource leak #3402

Merged
merged 5 commits into from
Oct 15, 2024
Merged

[fix] - resource leak #3402

merged 5 commits into from
Oct 15, 2024

Conversation

ahrav
Copy link
Collaborator

@ahrav ahrav commented Oct 12, 2024

Description:

This PR fixes an issue that can cause resource exhaustion when scanning binaries in Git. It ensures the stdoutPipe reader passed to HandleFile is always drained. If the reader isn't fully consumed, the cmd.Wait() method may hang, failing to release file descriptors.

This PR adds a context timeout to the git cat-file command, with a brief wait to allow the process time to complete before the process is killed.

Error msg: scanner error: error running git cat-file: pipe2: too many open files
Screenshot 2024-10-11 at 3 10 20 PM

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@@ -285,14 +285,9 @@ func HandleFile(
}
defer func() {
// Ensure all data is read to prevent broken pipe.
_, copyErr := io.Copy(io.Discard, rdr)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only necessary when the underlying reader is a pipe. If the reader isn't fully consumed, cmd.Wait() hangs indefinitely. Since HandleFile is used by other parts of the codebase with different readers, the logic to consume the reader (if it's a pipe) has been moved to the caller.

@ahrav ahrav force-pushed the fix-file-descriptor-exhaustion branch from eef8f1d to da46890 Compare October 12, 2024 16:31
@ahrav ahrav mentioned this pull request Oct 12, 2024
2 tasks
func combineErrors(err1, err2 error) error {
switch {
case err1 != nil && err2 != nil:
return fmt.Errorf("%w; %w", err1, err2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, you can do this? Which error is returned when you call Unwrap()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you could do this—I was wrong. I can just go with errors.Join.

Copy link
Collaborator Author

@ahrav ahrav Oct 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nvm 🤣 Just read In particular Unwrap does not unwrap errors returned by [Join](https://pkg.go.dev/errors#Join).

I guess it depends on if we plan on calling Unwrap or using errors.Is.

err = combineErrors(err, fmt.Errorf("command execution error: %w", waitErr))
}()

return handlers.HandleFile(catFileCtx, stdout, chunkSkel, reporter, handlers.WithSkipArchives(s.skipArchives))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HandleFile call rdr.Close() (and in this case rdr is stdout)? If yes, is it an issue to read from stdout after Close() has been called on it (in the defer block above)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but the HandleFile function does not directly close stdout. Instead, it wraps stdout in a BufferedReadSeeker, and when HandleFile calls rdr.Close(), it only closes the BufferedReadSeeker, not stdout. The Close() method of BufferedReadSeeker manages internal resources like returning buffers to a pool or closing temporary files without affecting the underlying stdout. here

So stdout remains open until cmd.Wait() completes in handleBinary. The deferred function in handleBinary ensures proper cleanup by reading any remaining data from stdout and waiting for the command to finish, which effectively closes stdout.

I feel like this approach effectively separates ownership: HandleFile manages the custom reader, while handleBinary is responsible for handling and closing stdout via cmd.Wait().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so HandleFile takes in an io.Reader, not io.ReadCloser. Closing is up to the caller.

Yeah, this is a great approach!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be an io.ReadCloser, but I confused myself a few times by mistakenly closing the passed-in reader and then trying to close it again at the call site. That's when I decided to change it to an io.Reader and have the callers handle closing. I think that's maybe why you also thought rdr.Close() closed the underlying reader.

@ahrav ahrav marked this pull request as ready for review October 14, 2024 02:18
@ahrav ahrav requested review from a team as code owners October 14, 2024 02:18
Copy link
Contributor

@dustin-decker dustin-decker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! CommandContext is useful, we should update all usage to that.

Copy link
Collaborator

@mcastorina mcastorina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am liking the clearer API boundaries, this is really nice!

pkg/handlers/handlers.go Outdated Show resolved Hide resolved
@ahrav ahrav merged commit bf38b84 into main Oct 15, 2024
13 checks passed
@ahrav ahrav deleted the fix-file-descriptor-exhaustion branch October 15, 2024 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants