-
Notifications
You must be signed in to change notification settings - Fork 43
network: Fix benchmarks sporadic issues with netns #223
Conversation
I haven't checked if that function is called by a short-lived goroutine, but if that isn't the case, it should be a good idea to call |
network.go
Outdated
// not be executed in a different thread than the one expected by the | ||
// caller. This is used in case of CNM network, because we need to | ||
// make sure the process switched to the given netns has PID == TID. | ||
// safeDoNetNS is free from any call to a go routine, and it calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we call that one doNetNS
? safeDoNetNS
implies there's an unsafe one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes you're right, this will make more sense !
@dlespiau I agree and I will add a defer call to the function. But just want to let you know that we can do this because in the CreatePod() function, the PrestartHook() function is the first one where we can expect safeDoNetNS() to be called, meaning that we already achieved our goal to keep the PID == TID. Look at api.go, I had added an init() function calling into LockOSThread(). The reason for this is about keeping the consumer of the library with its process ID the same than the thread ID until we get into the hook. Indeed, dockerd inspects only /proc/PID/ns/net, meaning we cannot have a different thread ID. |
This is definitely a bit fiddly, it also means the consumer of the library can't call API functions in a goroutine (it could be scheduled in a different thread than the main thread). They need to call API functions from the main thread (probably worth documenting). I'm not sure I follow the reason why your patch work. runtime.LockOSThread will lock the calling goroutine on its current thread, so I don't see any guarantee that the calling goroutine will be bound to the main thread unless it's already running onto the main thread. |
This commit gets rid of the original doNetNS() function as it is too complex and it introduces some sporadic issues related to the use of go routines and network namespaces at the same time. The cause of those issues is about some processes/threads changing during the runtime, causing some confusion with the network namespaces used. The only way is to really make sure everything is controlled by our library, meaning we make all of our network implementations rely on the new "safe" function doNetNS(). This patch also improves this function by calling into runtime.LockOSThread(). This implies that no matter who is the caller of this function, it will end up in the expected network namespace with the guarantee the thread won't change during the execution. This is solving all sporadic issues related to the usage of network namespaces that have been found with the benchmarks. Signed-off-by: Sebastien Boeuf <[email protected]>
api.go
Outdated
@@ -64,6 +64,9 @@ func CreatePod(podConfig PodConfig) (*Pod, error) { | |||
return nil, err | |||
} | |||
|
|||
// Unlock the thread here after the network hook was run. | |||
runtime.UnlockOSThread() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit awkward to call that from that from every other entry points. Can't we try to call it when we know we're finished with the namespace setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call is about the locking function from the init() function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we know that we don't need the lockOSThread after a certain point, we should call unlock() at that point, not in every single entry point?
@dlespiau Forcing the lockOSThread() inside the init() is the way we force the main thread to have the same ID than the process. When the runtime import virtcontainers library, init() is executed, making sure the thread is locked. |
I get the init() point, what I don't get is why putting a LockOSThread thread inside the function works. This depends on the thread the function is being called from. if the function is called form a goroutine with tid != pid, LockOSThread locks that function on to that thread, not the main thread. |
After a brief discussion on IRC, lgtm for the first patch. The second patch though looks a bit ugly. I'd rather not have it in that form if possible. |
@sboeuf go 1.10 may have a fix for this in the longer term golang/go#20676 |
This commit gets rid of doNetNS() function as it is too complex and it introduces some sporadic issues related to the use of go routines and network namespaces at the same time. The cause of those issues
is about some processes/threads changing during the runtime, causing some confusion with the network namespaces used.
The only way is to really make sure everything is controlled by our library, meaning we make all of our network implementations rely on the same "safe" function safeDoNetNS(). This patch also improves this
function by calling into runtime.LockOSThread(). This implies that no matter who is the caller of this function, it will end up in the expected network namespace with the guarantee the thread won't change
during the execution.
This is solving all sporadic issues related to the usage of network namespaces that have been found with the benchmarks.
Fixes #219