-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nfvconfig selftest hanging on raptorjit branch #1337
Comments
AFAIK, at the moment it's not possible to disable vmprofiling in Raptorjit, as it's always on. By that I mean there's not flag or environment variable to disable profiling. So in case of going for 3) it would be necessary as well to implement this mechanism. As for 2) initially it sounds good to me, I have one question though. As the issue is with system calls |
The PC-losering problem just won't die eh :). I have a vague feeling that Linux signals can be setup to not interrupt system calls i.e. to defer the signal until after the system call completes. This could be a nice solution since we are not that interested in profiling system calls? I can't find how to do that right now so it is possible that I dreamed it. |
Some info. on blocking system calls here:
https://stackoverflow.com/questions/2853653/deferring-signal-handling-in-linux
…On Wed, May 2, 2018, 08:14 Luke Gorrie, ***@***.***> wrote:
The PC-losering <https://www.dreamsongs.com/WIB.html> problem just won't
die eh :).
I have a vague feeling that Linux signals can be setup to not interrupt
system calls i.e. to defer the signal until after the system call
completes. This could be a nice solution since we are not that interested
in profiling system calls? I can't find how to do that right now so it is
possible that I dreamed it.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1337 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE7ethP_ObXezPOkodRn9Mb5ZmLMaCeJks5tuaMbgaJpZM4TusHx>
.
|
Would it solve the problem to switch our This should only profile the time spent running in userspace and a reasonable person might expect (...) that this would guarantee that system calls are never interrupted. This would make vmprofile less useful for profiling code that spends time in kernel space but that is not especially relevant to Snabb (and reasonably well covered by |
Recently some CI runs on the raptorjit branch seem to be failing because the nfvconfig selftest hangs. The most recent runs don't show this because they are erroring with a merge issue, but AFAIK the issue is still there.
Here's an example SnabbBot log from #1332 with the issue: https://gist.github.com/SnabbBot/1a57acb59416bd84bafe173985fb6895
The test hangs with some output like this:
I minimized the test a bit to make the issue easier to reproduce:
The test hangs at the
popen
call. In the actual test, thepopen
happens due to a use offirstfile
while getting some PCI device information forvirtual_ether_mux.configure
.One thing I noticed is that the reason this only hangs on the raptorjit branch is that it only happens when vmprofile is on.
If you do an strace, you can see the reason for this:
From this strace output, you can see that after a
popen
does aclone
to make a child process the syscall is constantly interrupted by the 1ms SIGPROF timer for vmprofile. This causes the syscall to restart, but it gets interrupted again. And so it gets stuck in an infinite loop.It looks like this has been an issue in other software like Chromium (https://bugs.chromium.org/p/chromium/issues/detail?id=83521) or with JVM profiling (async-profiler/async-profiler#97).
There seem to be a few potential ways to solve this:
firstfile
, it could use ljsyscall to do the filesystem operations.The first one sounds unappealing since it would reduce the profiling granularity, but maybe it's acceptable. The second seems pretty do-able for this particular case.
Any thoughts on a preferred solution?
The text was updated successfully, but these errors were encountered: