-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fifo.Close(): prevent possible panic if fifo is nil #32
Conversation
@cpuguy83 @AkihiroSuda ptal |
@@ -204,6 +204,10 @@ func (f *fifo) Write(b []byte) (int, error) { | |||
// before open(2) has returned and fifo was never opened. | |||
func (f *fifo) Close() (retErr error) { | |||
for { | |||
if f == nil { | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can also be used before open(2) has returned and fifo was never opened.
Also not sure if we should
- return an error in this case (currently it would use
retErr
, if set) - return
nil
(which would unsetretErr
if it was set before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the caller should check for nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it become nil
while we're in this for{}
loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should hope not... if so then that would be the bug that should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... so I suspect that something like that actually happens (somehow); following that trace, the panic followed from this code; https://github.com/containerd/containerd/blob/269548fa27e0089a8b8278fc4fc781d7f65a939b/cio/io.go#L199-L206
func (c *cio) Close() error {
var lastErr error
for _, closer := range c.closers {
if closer == nil {
continue
}
if err := closer.Close(); err != nil {
lastErr = err
}
}
return lastErr
}
So there's already a nil
check there. So only other explanation would be there there's some concurrency happening, in which case do we need locking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could have been explain by the infamous interface is nil problem, either we make sure that we dont pass nil io.Closer to cio, or do a type check ( yuck), or we check inside fifo.Close for nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, missed your comment here, @dqminh - do you mean we should use a different fix, or is the current "fix" ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the best case is that we need to make sure the caller doesn't pass a nil interface as that can be nasty too if we call anything other than Close ( which is fixed in this way ). The current check we are doing here is not the end of the world, but ideally i don't think we need it at this place.
OK, looks like this is failing;
|
I'm not sure if this is the right approach, and synchronisation should probably be added elsewhere to fix the underlying issue. Trying to prevent a panic that was seen on container restore in th docker daemon: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x5586c892a7a4] goroutine 420 [running]: github.com/docker/docker/vendor/github.com/containerd/fifo.(*fifo).Close(0x0, 0x0, 0x0) /go/src/github.com/docker/docker/vendor/github.com/containerd/fifo/fifo.go:208 +0x44 github.com/docker/docker/vendor/github.com/containerd/containerd/cio.(*cio).Close(0xc000d06f60, 0x5586cb5654d0, 0xc000d8e9e8) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/cio/io.go:203 +0x90 github.com/docker/docker/libcontainerd/remote.(*client).Restore.func1(0xc0008bf820, 0xc0008a2040) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:86 +0x5a github.com/docker/docker/libcontainerd/remote.(*client).Restore(0xc00098e5b0, 0x5586cb61c7c0, 0xc000052088, 0xc0011b6500, 0x40, 0xc0008bf810, 0x5586cb05cf00, 0xffffffffffffffff, 0x0, 0x0, ...) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:107 +0x923 github.com/docker/docker/daemon.(*Daemon).restore.func3(0xc00079d9e0, 0xc000a38230, 0xc00000c1e0, 0xc00079d9a8, 0xc000d84f00, 0xc000d84ed0, 0xc000d84ea0, 0xc00128a280) /go/src/github.com/docker/docker/daemon/daemon.go:351 +0x48a created by github.com/docker/docker/daemon.(*Daemon).restore /go/src/github.com/docker/docker/daemon/daemon.go:319 +0x4b3 If the fifo is nil, there's nothing to be done in Close(), so returning early in that situation. Signed-off-by: Sebastiaan van Stijn <[email protected]>
bb32f37
to
80cf87d
Compare
Hmm... looks like there's some flakiness; re-pushed, and now it passes; could it be related to Go 1.14/ Go 1.15? |
Ran Tibor's script from https://gist.github.com/tiborvass/eb0a4054679a43aaca22690a7c4452ed on this repository, in case it's useful;
|
@cpuguy83 @AkihiroSuda ptal |
@AkihiroSuda any thoughts on the discussion above? #32 (comment) (i.e.; would fixes be needed elsewhere (as well?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me since it would fix a production issue. I'm in "the caller should check that" camp though :)
@thaJeztah Did this make it into a release? I'm still seeing these issues occasionally on v20.10.12, with containerd 1.4.12 7b11cfaabd73bb80907dd23182b9347b4245eb5d. I can't seem to find a way to resolve the issue without restarting the entire server, which is very costly.
|
Not this was not in a release yet; v1.0.0...39bc37d |
I'm not sure if this is the right approach, and synchronisation should probably be added elsewhere to fix the underlying issue. Trying to prevent a panic that was seen on container restore in th docker daemon: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x5586c892a7a4] goroutine 420 [running]: github.com/docker/docker/vendor/github.com/containerd/fifo.(*fifo).Close(0x0, 0x0, 0x0) /go/src/github.com/docker/docker/vendor/github.com/containerd/fifo/fifo.go:208 +0x44 github.com/docker/docker/vendor/github.com/containerd/containerd/cio.(*cio).Close(0xc000d06f60, 0x5586cb5654d0, 0xc000d8e9e8) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/cio/io.go:203 +0x90 github.com/docker/docker/libcontainerd/remote.(*client).Restore.func1(0xc0008bf820, 0xc0008a2040) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:86 +0x5a github.com/docker/docker/libcontainerd/remote.(*client).Restore(0xc00098e5b0, 0x5586cb61c7c0, 0xc000052088, 0xc0011b6500, 0x40, 0xc0008bf810, 0x5586cb05cf00, 0xffffffffffffffff, 0x0, 0x0, ...) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:107 +0x923 github.com/docker/docker/daemon.(*Daemon).restore.func3(0xc00079d9e0, 0xc000a38230, 0xc00000c1e0, 0xc00079d9a8, 0xc000d84f00, 0xc000d84ed0, 0xc000d84ea0, 0xc00128a280) /go/src/github.com/docker/docker/daemon/daemon.go:351 +0x48a created by github.com/docker/docker/daemon.(*Daemon).restore /go/src/github.com/docker/docker/daemon/daemon.go:319 +0x4b3 If the fifo is nil, there's nothing to be done in Close(), so returning early in that situation. Backport: <containerd/fifo#32> SUSE-Bugs: bsc#1200022 Signed-off-by: Sebastiaan van Stijn <[email protected]>
I'm not sure if this is the right approach, and synchronisation should probably be added elsewhere to fix the underlying issue. Trying to prevent a panic that was seen on container restore in th docker daemon: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x5586c892a7a4] goroutine 420 [running]: github.com/docker/docker/vendor/github.com/containerd/fifo.(*fifo).Close(0x0, 0x0, 0x0) /go/src/github.com/docker/docker/vendor/github.com/containerd/fifo/fifo.go:208 +0x44 github.com/docker/docker/vendor/github.com/containerd/containerd/cio.(*cio).Close(0xc000d06f60, 0x5586cb5654d0, 0xc000d8e9e8) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/cio/io.go:203 +0x90 github.com/docker/docker/libcontainerd/remote.(*client).Restore.func1(0xc0008bf820, 0xc0008a2040) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:86 +0x5a github.com/docker/docker/libcontainerd/remote.(*client).Restore(0xc00098e5b0, 0x5586cb61c7c0, 0xc000052088, 0xc0011b6500, 0x40, 0xc0008bf810, 0x5586cb05cf00, 0xffffffffffffffff, 0x0, 0x0, ...) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:107 +0x923 github.com/docker/docker/daemon.(*Daemon).restore.func3(0xc00079d9e0, 0xc000a38230, 0xc00000c1e0, 0xc00079d9a8, 0xc000d84f00, 0xc000d84ed0, 0xc000d84ea0, 0xc00128a280) /go/src/github.com/docker/docker/daemon/daemon.go:351 +0x48a created by github.com/docker/docker/daemon.(*Daemon).restore /go/src/github.com/docker/docker/daemon/daemon.go:319 +0x4b3 If the fifo is nil, there's nothing to be done in Close(), so returning early in that situation. Backport: <containerd/fifo#32> SUSE-Bugs: bsc#1200022 Signed-off-by: Sebastiaan van Stijn <[email protected]>
I'm not sure if this is the right approach, and synchronisation should probably be added elsewhere to fix the underlying issue. Trying to prevent a panic that was seen on container restore in th docker daemon: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x5586c892a7a4] goroutine 420 [running]: github.com/docker/docker/vendor/github.com/containerd/fifo.(*fifo).Close(0x0, 0x0, 0x0) /go/src/github.com/docker/docker/vendor/github.com/containerd/fifo/fifo.go:208 +0x44 github.com/docker/docker/vendor/github.com/containerd/containerd/cio.(*cio).Close(0xc000d06f60, 0x5586cb5654d0, 0xc000d8e9e8) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/cio/io.go:203 +0x90 github.com/docker/docker/libcontainerd/remote.(*client).Restore.func1(0xc0008bf820, 0xc0008a2040) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:86 +0x5a github.com/docker/docker/libcontainerd/remote.(*client).Restore(0xc00098e5b0, 0x5586cb61c7c0, 0xc000052088, 0xc0011b6500, 0x40, 0xc0008bf810, 0x5586cb05cf00, 0xffffffffffffffff, 0x0, 0x0, ...) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:107 +0x923 github.com/docker/docker/daemon.(*Daemon).restore.func3(0xc00079d9e0, 0xc000a38230, 0xc00000c1e0, 0xc00079d9a8, 0xc000d84f00, 0xc000d84ed0, 0xc000d84ea0, 0xc00128a280) /go/src/github.com/docker/docker/daemon/daemon.go:351 +0x48a created by github.com/docker/docker/daemon.(*Daemon).restore /go/src/github.com/docker/docker/daemon/daemon.go:319 +0x4b3 If the fifo is nil, there's nothing to be done in Close(), so returning early in that situation. Backport: <containerd/fifo#32> SUSE-Bugs: bsc#1200022 Signed-off-by: Sebastiaan van Stijn <[email protected]>
I'm not sure if this is the right approach, and synchronisation should probably be added elsewhere to fix the underlying issue. Trying to prevent a panic that was seen on container restore in th docker daemon: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x5586c892a7a4] goroutine 420 [running]: github.com/docker/docker/vendor/github.com/containerd/fifo.(*fifo).Close(0x0, 0x0, 0x0) /go/src/github.com/docker/docker/vendor/github.com/containerd/fifo/fifo.go:208 +0x44 github.com/docker/docker/vendor/github.com/containerd/containerd/cio.(*cio).Close(0xc000d06f60, 0x5586cb5654d0, 0xc000d8e9e8) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/cio/io.go:203 +0x90 github.com/docker/docker/libcontainerd/remote.(*client).Restore.func1(0xc0008bf820, 0xc0008a2040) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:86 +0x5a github.com/docker/docker/libcontainerd/remote.(*client).Restore(0xc00098e5b0, 0x5586cb61c7c0, 0xc000052088, 0xc0011b6500, 0x40, 0xc0008bf810, 0x5586cb05cf00, 0xffffffffffffffff, 0x0, 0x0, ...) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:107 +0x923 github.com/docker/docker/daemon.(*Daemon).restore.func3(0xc00079d9e0, 0xc000a38230, 0xc00000c1e0, 0xc00079d9a8, 0xc000d84f00, 0xc000d84ed0, 0xc000d84ea0, 0xc00128a280) /go/src/github.com/docker/docker/daemon/daemon.go:351 +0x48a created by github.com/docker/docker/daemon.(*Daemon).restore /go/src/github.com/docker/docker/daemon/daemon.go:319 +0x4b3 If the fifo is nil, there's nothing to be done in Close(), so returning early in that situation. Backport: <containerd/fifo#32> SUSE-Bugs: bsc#1200022 Signed-off-by: Sebastiaan van Stijn <[email protected]>
I'm not sure if this is the right approach, and synchronisation should probably be added elsewhere to fix the underlying issue. Trying to prevent a panic that was seen on container restore in th docker daemon: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x5586c892a7a4] goroutine 420 [running]: github.com/docker/docker/vendor/github.com/containerd/fifo.(*fifo).Close(0x0, 0x0, 0x0) /go/src/github.com/docker/docker/vendor/github.com/containerd/fifo/fifo.go:208 +0x44 github.com/docker/docker/vendor/github.com/containerd/containerd/cio.(*cio).Close(0xc000d06f60, 0x5586cb5654d0, 0xc000d8e9e8) /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/cio/io.go:203 +0x90 github.com/docker/docker/libcontainerd/remote.(*client).Restore.func1(0xc0008bf820, 0xc0008a2040) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:86 +0x5a github.com/docker/docker/libcontainerd/remote.(*client).Restore(0xc00098e5b0, 0x5586cb61c7c0, 0xc000052088, 0xc0011b6500, 0x40, 0xc0008bf810, 0x5586cb05cf00, 0xffffffffffffffff, 0x0, 0x0, ...) /go/src/github.com/docker/docker/libcontainerd/remote/client.go:107 +0x923 github.com/docker/docker/daemon.(*Daemon).restore.func3(0xc00079d9e0, 0xc000a38230, 0xc00000c1e0, 0xc00079d9a8, 0xc000d84f00, 0xc000d84ed0, 0xc000d84ea0, 0xc00128a280) /go/src/github.com/docker/docker/daemon/daemon.go:351 +0x48a created by github.com/docker/docker/daemon.(*Daemon).restore /go/src/github.com/docker/docker/daemon/daemon.go:319 +0x4b3 If the fifo is nil, there's nothing to be done in Close(), so returning early in that situation. Backport: <containerd/fifo#32> SUSE-Bugs: bsc#1200022 Signed-off-by: Sebastiaan van Stijn <[email protected]>
relates to docker/for-linux#1186
I'm not sure if this is the right approach, and synchronisation should probably be added elsewhere to fix the underlying issue.
Trying to prevent a panic that was seen on container restore in th docker daemon:
If the fifo is nil, there's nothing to be done in Close(), so returning early in that situation.