network: Fix benchmarks sporadic issues with netns #223

sboeuf · 2017-05-01T22:43:07Z

This commit gets rid of doNetNS() function as it is too complex and it introduces some sporadic issues related to the use of go routines and network namespaces at the same time. The cause of those issues
is about some processes/threads changing during the runtime, causing some confusion with the network namespaces used.

The only way is to really make sure everything is controlled by our library, meaning we make all of our network implementations rely on the same "safe" function safeDoNetNS(). This patch also improves this
function by calling into runtime.LockOSThread(). This implies that no matter who is the caller of this function, it will end up in the expected network namespace with the guarantee the thread won't change
during the execution.

This is solving all sporadic issues related to the usage of network namespaces that have been found with the benchmarks.

Fixes #219

coveralls · 2017-05-01T22:47:47Z

Coverage increased (+0.02%) to 67.047% when pulling 8343200 on topic/semaphore_ci into fbc84a7 on master.

jodh-intel · 2017-05-02T08:07:37Z

lgtm

dlespiau · 2017-05-02T09:56:28Z

LockOSThread wires the calling goroutine to its current operating system thread. Until the calling goroutine exits or calls UnlockOSThread, it will always execute in that thread, and no other goroutine can.

I haven't checked if that function is called by a short-lived goroutine, but if that isn't the case, it should be a good idea to call UnlockOSThread so we don't end up with 1 goroutine take a full thread and not sharing it with any other.

sameo · 2017-05-02T10:51:35Z

network.go

-// not be executed in a different thread than the one expected by the
-// caller. This is used in case of CNM network, because we need to
-// make sure the process switched to the given netns has PID == TID.
+// safeDoNetNS is free from any call to a go routine, and it calls


Could we call that one doNetNS ? safeDoNetNS implies there's an unsafe one.

Oh yes you're right, this will make more sense !

sboeuf · 2017-05-02T13:58:36Z

@dlespiau I agree and I will add a defer call to the function. But just want to let you know that we can do this because in the CreatePod() function, the PrestartHook() function is the first one where we can expect safeDoNetNS() to be called, meaning that we already achieved our goal to keep the PID == TID. Look at api.go, I had added an init() function calling into LockOSThread(). The reason for this is about keeping the consumer of the library with its process ID the same than the thread ID until we get into the hook. Indeed, dockerd inspects only /proc/PID/ns/net, meaning we cannot have a different thread ID.
I will submit a second patch to this PR, in order to Unlock all threads for every APIs function, and one Unlock right after we don't need it anymore in the case of CreatePod().

dlespiau · 2017-05-02T14:19:13Z

This is definitely a bit fiddly, it also means the consumer of the library can't call API functions in a goroutine (it could be scheduled in a different thread than the main thread). They need to call API functions from the main thread (probably worth documenting).

I'm not sure I follow the reason why your patch work. runtime.LockOSThread will lock the calling goroutine on its current thread, so I don't see any guarantee that the calling goroutine will be bound to the main thread unless it's already running onto the main thread.

This commit gets rid of the original doNetNS() function as it is too complex and it introduces some sporadic issues related to the use of go routines and network namespaces at the same time. The cause of those issues is about some processes/threads changing during the runtime, causing some confusion with the network namespaces used. The only way is to really make sure everything is controlled by our library, meaning we make all of our network implementations rely on the new "safe" function doNetNS(). This patch also improves this function by calling into runtime.LockOSThread(). This implies that no matter who is the caller of this function, it will end up in the expected network namespace with the guarantee the thread won't change during the execution. This is solving all sporadic issues related to the usage of network namespaces that have been found with the benchmarks. Signed-off-by: Sebastien Boeuf <[email protected]>

dlespiau · 2017-05-02T14:28:45Z

api.go

@@ -64,6 +64,9 @@ func CreatePod(podConfig PodConfig) (*Pod, error) {
 		return nil, err
 	}

+	// Unlock the thread here after the network hook was run.
+	runtime.UnlockOSThread()


It's a bit awkward to call that from that from every other entry points. Can't we try to call it when we know we're finished with the namespace setup?

This call is about the locking function from the init() function.

If we know that we don't need the lockOSThread after a certain point, we should call unlock() at that point, not in every single entry point?

coveralls · 2017-05-02T14:34:44Z

Coverage increased (+0.3%) to 67.305% when pulling ada130b on topic/semaphore_ci into 189b100 on master.

sboeuf · 2017-05-02T14:35:59Z

@dlespiau Forcing the lockOSThread() inside the init() is the way we force the main thread to have the same ID than the process. When the runtime import virtcontainers library, init() is executed, making sure the thread is locked.
I agree this is not nice... But dockerd forces us to do that because of the wrong implementation...

dlespiau · 2017-05-02T14:41:54Z

I get the init() point, what I don't get is why putting a LockOSThread thread inside the function works. This depends on the thread the function is being called from. if the function is called form a goroutine with tid != pid, LockOSThread locks that function on to that thread, not the main thread.

dlespiau · 2017-05-02T14:54:00Z

After a brief discussion on IRC, lgtm for the first patch.

The second patch though looks a bit ugly. I'd rather not have it in that form if possible.

coveralls · 2017-05-02T15:05:46Z

Coverage increased (+0.02%) to 67.055% when pulling 5d33348 on topic/semaphore_ci into 189b100 on master.

mcastelino · 2017-07-24T22:06:56Z

@sboeuf go 1.10 may have a fix for this in the longer term golang/go#20676

sameo reviewed May 2, 2017

View reviewed changes

sboeuf force-pushed the topic/semaphore_ci branch from 8343200 to ada130b Compare May 2, 2017 14:26

dlespiau reviewed May 2, 2017

View reviewed changes

sboeuf force-pushed the topic/semaphore_ci branch from ada130b to 5d33348 Compare May 2, 2017 14:59

sboeuf merged commit e1094ac into master May 2, 2017

sboeuf deleted the topic/semaphore_ci branch May 2, 2017 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

network: Fix benchmarks sporadic issues with netns #223

network: Fix benchmarks sporadic issues with netns #223

sboeuf commented May 1, 2017 •

edited

Loading

coveralls commented May 1, 2017 •

edited

Loading

jodh-intel commented May 2, 2017 •

edited by dlespiau

Loading

dlespiau commented May 2, 2017

sameo May 2, 2017

sboeuf May 2, 2017

sboeuf commented May 2, 2017

dlespiau commented May 2, 2017

dlespiau May 2, 2017

sboeuf May 2, 2017

dlespiau May 2, 2017

coveralls commented May 2, 2017 •

edited

Loading

sboeuf commented May 2, 2017

dlespiau commented May 2, 2017

dlespiau commented May 2, 2017

coveralls commented May 2, 2017 •

edited

Loading

mcastelino commented Jul 24, 2017

network: Fix benchmarks sporadic issues with netns #223

network: Fix benchmarks sporadic issues with netns #223

Conversation

sboeuf commented May 1, 2017 • edited Loading

coveralls commented May 1, 2017 • edited Loading

jodh-intel commented May 2, 2017 • edited by dlespiau Loading

dlespiau commented May 2, 2017

sameo May 2, 2017

Choose a reason for hiding this comment

sboeuf May 2, 2017

Choose a reason for hiding this comment

sboeuf commented May 2, 2017

dlespiau commented May 2, 2017

dlespiau May 2, 2017

Choose a reason for hiding this comment

sboeuf May 2, 2017

Choose a reason for hiding this comment

dlespiau May 2, 2017

Choose a reason for hiding this comment

coveralls commented May 2, 2017 • edited Loading

sboeuf commented May 2, 2017

dlespiau commented May 2, 2017

dlespiau commented May 2, 2017

coveralls commented May 2, 2017 • edited Loading

mcastelino commented Jul 24, 2017

sboeuf commented May 1, 2017 •

edited

Loading

coveralls commented May 1, 2017 •

edited

Loading

jodh-intel commented May 2, 2017 •

edited by dlespiau

Loading

coveralls commented May 2, 2017 •

edited

Loading

coveralls commented May 2, 2017 •

edited

Loading