Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested_detach tests fail on Alder Lake when the CPU affinity is reset to all CPUs on detach #3402

Closed
agentzh opened this issue Nov 26, 2022 · 24 comments

Comments

@agentzh
Copy link

agentzh commented Nov 26, 2022

Continuing the discussion at #3398 in this dedicated issue.

It seems like the rr test suite only enabled 1425 tests on my Fedora 35 system running on the CPU Intel Core i9-13900K. It enables many more tests (2849!) on a KVM guest running Fedora 32 (kernel 5.11.22-100.fc32.x86_64). I saw many more test failures in this KVM guest running on the same hardware:

98% tests passed, 45 tests failed out of 2849

Total Test time (real) = 2533.94 sec

The following tests FAILED:
	 17 - alarm-no-syscallbuf (Failed)
	 58 - clock (Failed)
	 79 - clone_immediate_exit-no-syscallbuf (Failed)
	160 - fatal_sigsegv_thread (Failed)
	184 - flock (Failed)
	277 - ioctl_pty-no-syscallbuf (Failed)
	297 - keyctl-no-syscallbuf (Failed)
	325 - memfd_create_shared-no-syscallbuf (Failed)
	393 - mremap_non_page_size-no-syscallbuf (Failed)
	682 - signal_unstoppable (Failed)
	703 - sigpwr-no-syscallbuf (Failed)
	744 - stack_overflow (Failed)
	776 - syscallbuf_signal_blocking (Failed)
	881 - async_kill_with_threads-no-syscallbuf (Failed)
	967 - elapsed_time-no-syscallbuf (Failed)
	1088 - reverse_step_long (Failed)
	1111 - segfault-no-syscallbuf (Failed)
	1272 - dead_thread_target (Failed)
	1305 - nested_detach-no-syscallbuf (Failed)
	1306 - nested_detach_kill (Failed)
	1307 - nested_detach_kill-no-syscallbuf (Failed)
	1403 - trace_version-no-syscallbuf (Failed)
	1472 - capget-32 (Failed)
	1558 - epoll_edge-32 (Failed)
	1615 - fork_brk-32-no-syscallbuf (Failed)
	1744 - membarrier-32 (Failed)
	1797 - mount_ns_exec2-32-no-syscallbuf (Failed)
	2013 - save_data_fd-32-no-syscallbuf (Failed)
	2046 - seccomp_tsync-32 (Failed)
	2073 - shared_exec-32-no-syscallbuf (Failed)
	2129 - sigqueueinfo-32-no-syscallbuf (Failed)
	2169 - stack_overflow-32-no-syscallbuf (Failed)
	2336 - breakpoint-32 (Failed)
	2344 - call_gettid-32 (Failed)
	2373 - crash_in_function-32-no-syscallbuf (Failed)
	2446 - invalid_interpreter-32 (Failed)
	2448 - invalid_jump-32 (Failed)
	2533 - seccomp_signals-32-no-syscallbuf (Failed)
	2677 - check_patched_pthread-32-no-syscallbuf (Failed)
	2728 - nested_detach-32 (Failed)
	2750 - replay_serve_files-32 (Failed)
	2751 - replay_serve_files-32-no-syscallbuf (Failed)
	2774 - reverse_step_threads2-32 (Failed)
	2806 - step1-32 (Failed)
	2817 - subprocess_exit_ends_session-32-no-syscallbuf (Failed)
Errors while running CTest

I also pinned the virtual CPU cores of the guest to the real ones and limit the test suite on the P-cores only.

I'm using the current latest master branch (commit fcb243c).

Is it something we should worry about?

@khuey
Copy link
Collaborator

khuey commented Nov 26, 2022

Can you look in the relevant /tmp/rr-trace-whatever directory for a few of the tests and see why they're failing (i.e. is there a common error message in replay.err or record.err)?

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

@khuey Sure! But I'm not sure how to do that exactly.

Is there a way to run an individual test with a test number like 17 and 160? Then I guess it'll be easy to do. Sorry if it's a dumb question.

@khuey
Copy link
Collaborator

khuey commented Nov 26, 2022

ctest -I N,N will run just test N

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

Cool, will try it out. Thanks!

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

Got an error for one test case No. 26 (I have to repeatedly run this test case to reproduce a failure):

Test project /home/agentzh/git/rr/rr
    Start 26: at_threadexit
1/1 Test #26: at_threadexit ....................***Failed  Error regular expression found in output. Regex=[FAILED]  2.15 sec
source_dir/src/test/util.sh: line 287: 491664 Aborted                 (core dumped) _RR_TRACE_DIR="$workdir" test-monitor $TIMEOUT replay.err $RR_EXE $GLOBAL_OPTIONS replay -a $replayflags > replay.out 2> replay.err
Test 'at_threadexit' FAILED: : error during replay:
--------------------------------------------------
[FATAL src/ReplaySession.cc:1180:check_ticks_consistency()]
 (task 491666 (rec:491662) at time 195)
 -> Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'SYSCALL: rt_sigprocmask'; expected 53916, got 53913
Tail of trace dump:
{
  real_time:8419.364275 global_time:175, event:`SYSCALL: mprotect' (state:EXITING_SYSCALL) tid:491662, ticks:52017
rax:0x0 rbx:0x7f68613a4520 rcx:0xffffffffffffffff rdx:0x1 rsi:0x1000 rdi:0x7f6861381000 rbp:0x7ffeb3074bf0 rsp:0x7ffeb3074ad8 r8:0xffff r9:0x7f6861391498 r10:0x7f6861381d90 r11:0x246 r12:0x7 r13:0x3c00000007 r14:0x7f6861378fd0 r15:0x7f68613a4520 rip:0x7f68613c412b eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xa fs_base:0x7f6861184180 gs_base:0x0
}

I can only reproduce this failure in the KVM guest, not in the host OS.

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

And another one looks the same:

Test project /home/agentzh/git/rr/rr
    Start 17: alarm-no-syscallbuf
1/1 Test #17: alarm-no-syscallbuf ..............***Failed  Error regular expression found in output. Regex=[FAILED]  3.14 sec
source_dir/src/test/util.sh: line 287: 492741 Aborted                 (core dumped) _RR_TRACE_DIR="$workdir" test-monitor $TIMEOUT replay.err $RR_EXE $GLOBAL_OPTIONS replay -a $replayflags > replay.out 2> replay.err
Test 'alarm' FAILED: : error during replay:
--------------------------------------------------
[FATAL src/ReplaySession.cc:1180:check_ticks_consistency()]
 (task 492743 (rec:492740) at time 76)
 -> Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'SYSCALL: mmap'; expected 8770, got 8930
Tail of trace dump:
{
  real_time:8609.707883 global_time:56, event:`SYSCALL: close' (state:ENTERING_SYSCALL) tid:492740, ticks:8229
rax:0xffffffffffffffda rbx:0x6ffffdff rcx:0xffffffffffffffff rdx:0x0 rsi:0x29 rdi:0x4 rbp:0x7ffedfc33fb0 rsp:0x7ffedfc33e58 r8:0x7f4f5aa69000 r9:0xeffffef5 r10:0x70000022 r11:0x246 r12:0x7f4f5aa7c000 r13:0x6ffffeff r14:0x7ffedfc34088 r15:0x4 rip:0x7f4f5aa9bd1b eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3 fs_base:0x0 gs_base:0x0
}

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

Got another one:

Test project /home/agentzh/git/rr/rr
    Start 2336: breakpoint-32
1/1 Test #2336: breakpoint-32 ....................***Failed  Error regular expression found in output. Regex=[FAILED]  1.08 sec
source_dir/src/test/util.sh: line 305: 494445 Aborted                 (core dumped) _RR_TRACE_DIR="$workdir" test-monitor $TIMEOUT debug.err python3 $TESTDIR/$expectscript.py $RR_EXE $GLOBAL_OPTIONS replay -o-n -o-ix -o$TESTDIR/test_setup.gdb $replayargs
Test 'breakpoint_32' FAILED: debug script failed
--------------------------------------------------
gdb_rr.log:
[FATAL src/PerfCounters.cc:380:check_working_counters()]
Got 0 branch events, expected at least 500.

The hardware performance counter seems to not be working. Check
that hardware performance counters are working by running
  perf stat -e r5111c4 true
and checking that it reports a nonzero number of events.
If performance counters seem to be working with 'perf', file an
rr issue, otherwise check your hardware/OS/VM configuration. Also
check that other software is not using performance counters on
this CPU.

I also saw this one for test No. 26 and 17 sometimes.

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

Is it a sign that rr actually ran on an E-core?

@khuey
Copy link
Collaborator

khuey commented Nov 26, 2022

Possibly. Try locking the VM to just the P-cores?

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

BTW, the test 1304 exhibits similar error on the host OS (without any virtualization):

Test project /home/agentzh/git/rr/rr
    Start 1304: nested_detach
1/1 Test #1304: nested_detach ....................***Failed  Error regular expression found in output. Regex=[FAILED]  1.12 sec
source_dir/src/test/util.sh: line 254: 874200 Aborted                 (core dumped) _RR_TRACE_DIR="$workdir" test-monitor $TIMEOUT record.err $RR_EXE $GLOBAL_OPTIONS record $LIB_ARG $RECORD_ARGS "$exe" $exeargs > record.out 2> record.err
Test 'nested_detach' FAILED: : error during recording:
--------------------------------------------------
[FATAL src/PerfCounters.cc:380:check_working_counters() errno: EDOM]
Got 0 branch events, expected at least 500.

The hardware performance counter seems to not be working. Check
that hardware performance counters are working by running
  perf stat -e r5111c4 true
and checking that it reports a nonzero number of events.
If performance counters seem to be working with 'perf', file an
rr issue, otherwise check your hardware/OS/VM configuration. Also
check that other software is not using performance counters on
this CPU.

It also fails exactly the same way in the KVM guest. It's just that the KVM guest hits such things much more easily.

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

Is rr known to have issues on hyperthreaded CPU cores? I have HP enabled in BIOS. But strangely on the host OS, CPU 0 should never change but just Test 1304 has this error and almost all the other tests work fine (I use the command taskset -a -c 0 ..., BTW).

@khuey
Copy link
Collaborator

khuey commented Nov 26, 2022

No, there are no known hyperthreading issues.

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

Ah, seems like the Linux kernel might schedule CPU 0 to an E-core even in the host OS...

After I disable all the E-cores (but not hyper-threading) in BIOS, test 1304 never fails in the host OS.

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

After disabling all the E-cores in BIOS, the rr test suite pass reliably except for those x86/pkeys* tests are known to be a kernel bug. All the previously failing nested_detach* tests are all passing reliably now, even running repeatedly.

Tests 1304, 26, and 17 are also passing reliably in the KVM guest as well.

So the workaround is to disable all the E-cores in BIOS to get rr working reliably...

Is there anything rr can do to improve the situation here? Crippling the CPU by disabling all its E-cores is really sad...

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

Okay, I scanned the verbose output from ctest to dig deeper on the host OS (no virtualization involved). It seems like taskset -a -c 0 does not take effect for at least some rr processes:

====== /proc/122949/status
Name:	rr
Cpus_allowed_list:	24
====== /proc/122961/status
Name:	rr
Cpus_allowed_list:	24

And CPU 24 is definitely an E-core (since on my system, only CPU 0-15 are P-core threads).

So the Linux kernel does not really reassign CPU 0 to an E-core secretly.

Does the rr test scaffold do automatic CPU affinity settings on its own?

I noted it's not always CPU 24. Another run yields CPU 30 for some rr processes:

====== /proc/147482/status
Name:	rr
Cpus_allowed_list:	30

CPU 30 is still an E-core!

How to avoid such random CPU affinity settings in the test scaffold?

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

Maybe those nested_detach* tests intentionally bypass the parent processes' CPU affinity settings?

@agentzh
Copy link
Author

agentzh commented Nov 26, 2022

And I also found out why the KVM guest has those test failures. My bad. I forgot to pin the vCPU 5 to the host's CPU 5. So when the tests fall onto vCPU 5 (as verified by ctest --output-on-failure logs), it might get scheduled by the KVM hypervisor to an E-core. Mystery solved and sorry for the inconvenience.

Regarding the nested_detach* test failures on the host OS (not in a KVM guest), I think we can be pretty sure that it is due to those tests explicitly bypassing the CPU affinity settings of the parent processes. The details might not matter anyway.

I'm closing this. Glad that I learned some basic rr test failure debugging skills today, and thanks @khuey for the help.

I think rr is in very good shape for Raptor Lake CPUs now, both in physical boxes and in KVM guests (as long as we avoid E-cores completely).

@agentzh agentzh closed this as completed Nov 26, 2022
@khuey
Copy link
Collaborator

khuey commented Nov 26, 2022

#3338 is about automatically doing the right thing depending on whether we end up on a P-core or an E-core.

IIRC there's some code in the nested detach stuff that resets the CPU affinity mask (so that the two newly independent tracees are not on the same CPU) and that probably assumes that the original mask allowed all CPUs.

@khuey
Copy link
Collaborator

khuey commented Nov 26, 2022

Can you try this?

diff --git a/src/Session.cc b/src/Session.cc
index bcd1e415..7a790c34 100644
--- a/src/Session.cc
+++ b/src/Session.cc
@@ -67,16 +67,17 @@ Session::Session(const Session& other) {
   next_task_serial_ = other.next_task_serial_;
   done_initial_exec_ = other.done_initial_exec_;
   rrcall_base_ = other.rrcall_base_;
   visible_execution_ = other.visible_execution_;
   tracee_socket = other.tracee_socket;
   tracee_socket_receiver = other.tracee_socket_receiver;
   tracee_socket_fd_number = other.tracee_socket_fd_number;
   ticks_semantics_ = other.ticks_semantics_;
+  original_affinity_ = other.original_affinity_;
 }
 
 void Session::on_create(ThreadGroup* tg) { thread_group_map_[tg->tguid()] = tg; }
 void Session::on_destroy(ThreadGroup* tg) {
   thread_group_map_.erase(tg->tguid());
 }
 
 void Session::post_exec() {
@@ -731,16 +732,18 @@ static bool set_cpu_affinity(int cpu) {
       return false;
     }
     FATAL() << "Couldn't bind to CPU " << cpu;
   }
   return true;
 }
 
 void Session::do_bind_cpu() {
+  sched_getaffinity(0, sizeof(original_affinity_), &original_affinity_);
+
   int cpu_index = this->cpu_binding();
   if (cpu_index >= 0) {
     // Set CPU affinity now, after we've created any helper threads
     // (so they aren't affected), but before we create any
     // tracees (so they are all affected).
     // Note that we're binding rr itself to the same CPU as the
     // tracees, since this seems to help performance.
     if (!set_cpu_affinity(cpu_index)) {
diff --git a/src/Session.h b/src/Session.h
index a4ae9cba..7e2f390e 100644
--- a/src/Session.h
+++ b/src/Session.h
@@ -394,16 +394,18 @@ public:
   int syscall_number_for_rrcall_rdtsc() const {
     return SYS_rrcall_rdtsc - RR_CALL_BASE + rrcall_base_;
   }
 
   /* Bind the current process to the a CPU as specified in the session options
      or trace */
   void do_bind_cpu();
 
+  cpu_set_t original_affinity() const { return original_affinity_; }
+
   const ThreadGroupMap& thread_group_map() const { return thread_group_map_; }
 
   virtual int tracee_output_fd(int dflt) {
     return dflt;
   }
 
 protected:
   Session();
@@ -446,16 +448,18 @@ protected:
   uint32_t next_task_serial_;
   ScopedFd spawned_task_error_fd_;
 
   int rrcall_base_;
   PtraceSyscallBeforeSeccomp syscall_seccomp_ordering_;
 
   TicksSemantics ticks_semantics_;
 
+  cpu_set_t original_affinity_;
+
   /**
    * True if we've done an exec so tracees are now in a state that will be
    * consistent across record and replay.
    */
   bool done_initial_exec_;
 
   /**
    * True while the execution of this session is visible to users.
diff --git a/src/record_syscall.cc b/src/record_syscall.cc
index e2e64bde..09f23e22 100644
--- a/src/record_syscall.cc
+++ b/src/record_syscall.cc
@@ -3543,18 +3543,19 @@ static pid_t do_detach_teleport(RecordTask *t)
   {
     AutoRemoteSyscalls remote(new_t, AutoRemoteSyscalls::DISABLE_MEMORY_PARAMS);
     remote.infallible_close_syscall_if_alive(tracee_fd_number);
   }
   t->vm()->monkeypatcher().unpatch_syscalls_in(new_t);
   // Try to reset the scheduler affinity that we enforced upon the task.
   // XXX: It would be nice to track what affinity the tracee requested and
   // restore that.
-  cpu_set_t mask;
-  memset(&mask, 0xFF, sizeof(mask));
+  // For now honor whatever affinity rr itself has (e.g. for running on P-cores
+  // on Alder Lake).
+  cpu_set_t mask = t->session().original_affinity();
   syscall(SYS_sched_setaffinity, new_t->tid, sizeof(mask), &mask);
   new_t->detach();
   new_t->did_kill();
   delete new_t;
   return new_tid;
 }
 
 template <typename Arch>

@khuey
Copy link
Collaborator

khuey commented Nov 28, 2022

@agentzh ?

@gcp
Copy link
Contributor

gcp commented Dec 7, 2022

Linux 6.0.10 on 12900K:

33% tests passed, 8 tests failed out of 12

Total Test time (real) =   9.45 sec

The following tests FAILED:
        1060 - nested_detach_wait (Failed)
        1304 - nested_detach (Failed)
        1305 - nested_detach-no-syscallbuf (Failed)
        1306 - nested_detach_kill (Failed)
        1307 - nested_detach_kill-no-syscallbuf (Failed)
        2486 - nested_detach_wait-32 (Failed)
        2732 - nested_detach_kill-32 (Failed)
        2733 - nested_detach_kill-32-no-syscallbuf (Failed)

With patch

67% tests passed, 4 tests failed out of 12

Total Test time (real) =  23.80 sec

The following tests FAILED:
        1305 - nested_detach-no-syscallbuf (Failed)
        1306 - nested_detach_kill (Failed)
        2486 - nested_detach_wait-32 (Failed)
        2730 - nested_detach-32 (Failed)

@gcp
Copy link
Contributor

gcp commented Dec 7, 2022

Corrected command:

morbo@alder:~/git/rr$ taskset -c 0 ctest -R nested_detach
Test project /home/morbo/git/rr
      Start 1060: nested_detach_wait
 1/12 Test #1060: nested_detach_wait ....................   Passed    0.15 sec
      Start 1061: nested_detach_wait-no-syscallbuf
 2/12 Test #1061: nested_detach_wait-no-syscallbuf ......   Passed    0.11 sec
      Start 1304: nested_detach
 3/12 Test #1304: nested_detach .........................   Passed    0.10 sec
      Start 1305: nested_detach-no-syscallbuf
 4/12 Test #1305: nested_detach-no-syscallbuf ...........   Passed    0.09 sec
      Start 1306: nested_detach_kill
 5/12 Test #1306: nested_detach_kill ....................   Passed    6.19 sec
      Start 1307: nested_detach_kill-no-syscallbuf
 6/12 Test #1307: nested_detach_kill-no-syscallbuf ......   Passed    6.19 sec
      Start 2486: nested_detach_wait-32
 7/12 Test #2486: nested_detach_wait-32 .................   Passed    0.12 sec
      Start 2487: nested_detach_wait-32-no-syscallbuf
 8/12 Test #2487: nested_detach_wait-32-no-syscallbuf ...   Passed    0.11 sec
      Start 2730: nested_detach-32
 9/12 Test #2730: nested_detach-32 ......................   Passed    0.10 sec
      Start 2731: nested_detach-32-no-syscallbuf
10/12 Test #2731: nested_detach-32-no-syscallbuf ........   Passed    0.09 sec
      Start 2732: nested_detach_kill-32
11/12 Test #2732: nested_detach_kill-32 .................   Passed    6.19 sec
      Start 2733: nested_detach_kill-32-no-syscallbuf
12/12 Test #2733: nested_detach_kill-32-no-syscallbuf ...   Passed    6.19 sec

100% tests passed, 0 tests failed out of 12

Total Test time (real) =  25.67 sec

@khuey khuey reopened this Dec 8, 2022
@khuey khuey changed the title Many test failures on a KVM guest running Fedora 32 on Core i9-13900K nested_detach tests fail on Alder Lake when the CPU affinity is reset to all CPUs on detach Dec 8, 2022
@khuey khuey closed this as completed in 1db293d Dec 8, 2022
@agentzh
Copy link
Author

agentzh commented Dec 8, 2022

@khuey Sorry, a bit late in the game. So your previous patch is already merged into the master? I'll try the latest master in a minute, then. Thanks!

@agentzh
Copy link
Author

agentzh commented Dec 8, 2022

@khuey yay! the latest master seems to pass all those nested tests now. Thanks a lot!

agentzh@alder ~/git/rr/rr 0 (master)$ taskset -a -c 0-8 ctest -I 1304,1307 --output-on-failure |& tee a.txt
Test project /home/agentzh/git/rr/rr
    Start 1304: nested_detach
1/4 Test #1304: nested_detach ......................   Passed    0.13 sec
    Start 1305: nested_detach-no-syscallbuf
2/4 Test #1305: nested_detach-no-syscallbuf ........   Passed    0.13 sec
    Start 1306: nested_detach_kill
3/4 Test #1306: nested_detach_kill .................   Passed    6.66 sec
    Start 1307: nested_detach_kill-no-syscallbuf
4/4 Test #1307: nested_detach_kill-no-syscallbuf ...   Passed    6.63 sec

100% tests passed, 0 tests failed out of 4

Total Test time (real) =  13.55 sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants