Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpod: fix race when closing STDIN #11864

Merged
merged 1 commit into from
Oct 6, 2021

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Oct 5, 2021

What this PR does / why we need it:

There is a race where conn.Close() was called before conn.CloseWrite().
In this case CloseWrite will fail and an useless error is printed. To
fix this we move the the CloseWrite() call to the same goroutine to
remove the race. This ensures that CloseWrite() is called before
Close() and never afterwards.
This is causing flakes in CI testing.

How to verify it

not possible, CI flake

Which issue(s) this PR fixes:

Fixes #11856

Special notes for your reviewer:

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2021
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice catch!

@containers/podman-maintainers PTAL

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one concern should we acquire mutex and then try so Close with error ? Correct me if i am wrong we will be all missing errors which are being created while copying buffers and silently closing channel.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 5, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@edsantiago
Copy link
Member

Looks like at least one of those Closes is needed after all. I'm a little alarmed that none of the int tests caught that.

@mheon
Copy link
Member

mheon commented Oct 5, 2021

Removing these entirely breaks scenarios where STDIN closes immediately again - Podman will hang forever waiting for input that never comes.

It begins to sound like we need a dedicated "Closed" bool somewhere?

@Luap99
Copy link
Member Author

Luap99 commented Oct 5, 2021

How about I keep CloseWrite but do not log the error?

@flouthoc
Copy link
Collaborator

flouthoc commented Oct 5, 2021

@Luap99 how about we abstract the output from channel into a struct ? and we can use channel to relay error back if any with the actual result and we can process error at caller level

Something like

type Result struct {
    ActualData string
    Error error
}

ch := make(chan Result)

@mheon
Copy link
Member

mheon commented Oct 5, 2021

@Luap99 Can we catch and ignore duplicate close errors and print the others?

@Luap99 Luap99 changed the title libpod: attach do not call close twice libpod: fix race when closing STDIN Oct 6, 2021
@Luap99
Copy link
Member Author

Luap99 commented Oct 6, 2021

I pushed a new version which closes STDIN in same goroutine so the race should be gone.
I also added a new test for this.

@rhatdan
Copy link
Member

rhatdan commented Oct 6, 2021

LGTM

var err error
select {
case err = <-receiveStdoutError:
return err
case err = <-stdinDone:
// copy stdin is done, close it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantics here aren't a precise match, we need to only do the one-sided close if err is nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed
Also added another fix for podman-remote run where stdin was never closed.

There is a race where `conn.Close()` was called before `conn.CloseWrite()`.
In this case `CloseWrite` will fail and an useless error is printed. To
fix this we move the the `CloseWrite()` call to the same goroutine to
remove the race. This ensures that `CloseWrite()` is called before
`Close()` and never afterwards.
Also fixed podman-remote run where the STDIN was never was closed.
This is causing flakes in CI testing.

[NO TESTS NEEDED]

Fixes containers#11856

Signed-off-by: Paul Holzinger <[email protected]>
@mheon
Copy link
Member

mheon commented Oct 6, 2021

LGTM

@rhatdan
Copy link
Member

rhatdan commented Oct 6, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 6, 2021
@openshift-merge-robot openshift-merge-robot merged commit 03c17e9 into containers:main Oct 6, 2021
@Luap99 Luap99 deleted the close branch October 6, 2021 18:45
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
7 participants