Skip to content
This repository has been archived by the owner on Apr 3, 2018. It is now read-only.

network: Fix benchmarks sporadic issues with netns #223

Merged
merged 1 commit into from
May 2, 2017

Conversation

sboeuf
Copy link
Collaborator

@sboeuf sboeuf commented May 1, 2017

This commit gets rid of doNetNS() function as it is too complex and it introduces some sporadic issues related to the use of go routines and network namespaces at the same time. The cause of those issues
is about some processes/threads changing during the runtime, causing some confusion with the network namespaces used.

The only way is to really make sure everything is controlled by our library, meaning we make all of our network implementations rely on the same "safe" function safeDoNetNS(). This patch also improves this
function by calling into runtime.LockOSThread(). This implies that no matter who is the caller of this function, it will end up in the expected network namespace with the guarantee the thread won't change
during the execution.

This is solving all sporadic issues related to the usage of network namespaces that have been found with the benchmarks.

Fixes #219

@coveralls
Copy link

coveralls commented May 1, 2017

Coverage Status

Coverage increased (+0.02%) to 67.047% when pulling 8343200 on topic/semaphore_ci into fbc84a7 on master.

@jodh-intel
Copy link
Collaborator

jodh-intel commented May 2, 2017

lgtm

Approved with PullApprove

@dlespiau
Copy link

dlespiau commented May 2, 2017

LockOSThread wires the calling goroutine to its current operating system thread. Until the calling goroutine exits or calls UnlockOSThread, it will always execute in that thread, and no other goroutine can.

I haven't checked if that function is called by a short-lived goroutine, but if that isn't the case, it should be a good idea to call UnlockOSThread so we don't end up with 1 goroutine take a full thread and not sharing it with any other.

network.go Outdated
// not be executed in a different thread than the one expected by the
// caller. This is used in case of CNM network, because we need to
// make sure the process switched to the given netns has PID == TID.
// safeDoNetNS is free from any call to a go routine, and it calls
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call that one doNetNS ? safeDoNetNS implies there's an unsafe one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes you're right, this will make more sense !

@sboeuf
Copy link
Collaborator Author

sboeuf commented May 2, 2017

@dlespiau I agree and I will add a defer call to the function. But just want to let you know that we can do this because in the CreatePod() function, the PrestartHook() function is the first one where we can expect safeDoNetNS() to be called, meaning that we already achieved our goal to keep the PID == TID. Look at api.go, I had added an init() function calling into LockOSThread(). The reason for this is about keeping the consumer of the library with its process ID the same than the thread ID until we get into the hook. Indeed, dockerd inspects only /proc/PID/ns/net, meaning we cannot have a different thread ID.
I will submit a second patch to this PR, in order to Unlock all threads for every APIs function, and one Unlock right after we don't need it anymore in the case of CreatePod().

@dlespiau
Copy link

dlespiau commented May 2, 2017

This is definitely a bit fiddly, it also means the consumer of the library can't call API functions in a goroutine (it could be scheduled in a different thread than the main thread). They need to call API functions from the main thread (probably worth documenting).

I'm not sure I follow the reason why your patch work. runtime.LockOSThread will lock the calling goroutine on its current thread, so I don't see any guarantee that the calling goroutine will be bound to the main thread unless it's already running onto the main thread.

This commit gets rid of the original doNetNS() function as it is too
complex and it introduces some sporadic issues related to the use of
go routines and network namespaces at the same time. The cause of those
issues is about some processes/threads changing during the runtime,
causing some confusion with the network namespaces used.

The only way is to really make sure everything is controlled by our
library, meaning we make all of our network implementations rely on
the new "safe" function doNetNS(). This patch also improves this
function by calling into runtime.LockOSThread(). This implies that no
matter who is the caller of this function, it will end up in the
expected network namespace with the guarantee the thread won't change
during the execution.

This is solving all sporadic issues related to the usage of network
namespaces that have been found with the benchmarks.

Signed-off-by: Sebastien Boeuf <[email protected]>
api.go Outdated
@@ -64,6 +64,9 @@ func CreatePod(podConfig PodConfig) (*Pod, error) {
return nil, err
}

// Unlock the thread here after the network hook was run.
runtime.UnlockOSThread()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit awkward to call that from that from every other entry points. Can't we try to call it when we know we're finished with the namespace setup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is about the locking function from the init() function.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we know that we don't need the lockOSThread after a certain point, we should call unlock() at that point, not in every single entry point?

@coveralls
Copy link

coveralls commented May 2, 2017

Coverage Status

Coverage increased (+0.3%) to 67.305% when pulling ada130b on topic/semaphore_ci into 189b100 on master.

@sboeuf
Copy link
Collaborator Author

sboeuf commented May 2, 2017

@dlespiau Forcing the lockOSThread() inside the init() is the way we force the main thread to have the same ID than the process. When the runtime import virtcontainers library, init() is executed, making sure the thread is locked.
I agree this is not nice... But dockerd forces us to do that because of the wrong implementation...

@dlespiau
Copy link

dlespiau commented May 2, 2017

I get the init() point, what I don't get is why putting a LockOSThread thread inside the function works. This depends on the thread the function is being called from. if the function is called form a goroutine with tid != pid, LockOSThread locks that function on to that thread, not the main thread.

@dlespiau
Copy link

dlespiau commented May 2, 2017

After a brief discussion on IRC, lgtm for the first patch.

The second patch though looks a bit ugly. I'd rather not have it in that form if possible.

@coveralls
Copy link

coveralls commented May 2, 2017

Coverage Status

Coverage increased (+0.02%) to 67.055% when pulling 5d33348 on topic/semaphore_ci into 189b100 on master.

@sboeuf sboeuf merged commit e1094ac into master May 2, 2017
@sboeuf sboeuf deleted the topic/semaphore_ci branch May 2, 2017 18:40
@mcastelino
Copy link
Collaborator

@sboeuf go 1.10 may have a fix for this in the longer term golang/go#20676

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants