-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nested_detach tests fail on Alder Lake when the CPU affinity is reset to all CPUs on detach #3402
Comments
Can you look in the relevant /tmp/rr-trace-whatever directory for a few of the tests and see why they're failing (i.e. is there a common error message in replay.err or record.err)? |
@khuey Sure! But I'm not sure how to do that exactly. Is there a way to run an individual test with a test number like 17 and 160? Then I guess it'll be easy to do. Sorry if it's a dumb question. |
|
Cool, will try it out. Thanks! |
Got an error for one test case No. 26 (I have to repeatedly run this test case to reproduce a failure):
I can only reproduce this failure in the KVM guest, not in the host OS. |
And another one looks the same:
|
Got another one:
I also saw this one for test No. 26 and 17 sometimes. |
Is it a sign that |
Possibly. Try locking the VM to just the P-cores? |
BTW, the test 1304 exhibits similar error on the host OS (without any virtualization):
It also fails exactly the same way in the KVM guest. It's just that the KVM guest hits such things much more easily. |
Is rr known to have issues on hyperthreaded CPU cores? I have HP enabled in BIOS. But strangely on the host OS, CPU 0 should never change but just Test 1304 has this error and almost all the other tests work fine (I use the command |
No, there are no known hyperthreading issues. |
Ah, seems like the Linux kernel might schedule CPU 0 to an E-core even in the host OS... After I disable all the E-cores (but not hyper-threading) in BIOS, test 1304 never fails in the host OS. |
After disabling all the E-cores in BIOS, the rr test suite pass reliably except for those Tests 1304, 26, and 17 are also passing reliably in the KVM guest as well. So the workaround is to disable all the E-cores in BIOS to get rr working reliably... Is there anything rr can do to improve the situation here? Crippling the CPU by disabling all its E-cores is really sad... |
Okay, I scanned the verbose output from
And CPU 24 is definitely an E-core (since on my system, only CPU 0-15 are P-core threads). So the Linux kernel does not really reassign CPU 0 to an E-core secretly. Does the rr test scaffold do automatic CPU affinity settings on its own? I noted it's not always CPU 24. Another run yields CPU 30 for some rr processes:
CPU 30 is still an E-core! How to avoid such random CPU affinity settings in the test scaffold? |
Maybe those |
And I also found out why the KVM guest has those test failures. My bad. I forgot to pin the vCPU 5 to the host's CPU 5. So when the tests fall onto vCPU 5 (as verified by Regarding the I'm closing this. Glad that I learned some basic rr test failure debugging skills today, and thanks @khuey for the help. I think rr is in very good shape for Raptor Lake CPUs now, both in physical boxes and in KVM guests (as long as we avoid E-cores completely). |
#3338 is about automatically doing the right thing depending on whether we end up on a P-core or an E-core. IIRC there's some code in the nested detach stuff that resets the CPU affinity mask (so that the two newly independent tracees are not on the same CPU) and that probably assumes that the original mask allowed all CPUs. |
Can you try this? diff --git a/src/Session.cc b/src/Session.cc
index bcd1e415..7a790c34 100644
--- a/src/Session.cc
+++ b/src/Session.cc
@@ -67,16 +67,17 @@ Session::Session(const Session& other) {
next_task_serial_ = other.next_task_serial_;
done_initial_exec_ = other.done_initial_exec_;
rrcall_base_ = other.rrcall_base_;
visible_execution_ = other.visible_execution_;
tracee_socket = other.tracee_socket;
tracee_socket_receiver = other.tracee_socket_receiver;
tracee_socket_fd_number = other.tracee_socket_fd_number;
ticks_semantics_ = other.ticks_semantics_;
+ original_affinity_ = other.original_affinity_;
}
void Session::on_create(ThreadGroup* tg) { thread_group_map_[tg->tguid()] = tg; }
void Session::on_destroy(ThreadGroup* tg) {
thread_group_map_.erase(tg->tguid());
}
void Session::post_exec() {
@@ -731,16 +732,18 @@ static bool set_cpu_affinity(int cpu) {
return false;
}
FATAL() << "Couldn't bind to CPU " << cpu;
}
return true;
}
void Session::do_bind_cpu() {
+ sched_getaffinity(0, sizeof(original_affinity_), &original_affinity_);
+
int cpu_index = this->cpu_binding();
if (cpu_index >= 0) {
// Set CPU affinity now, after we've created any helper threads
// (so they aren't affected), but before we create any
// tracees (so they are all affected).
// Note that we're binding rr itself to the same CPU as the
// tracees, since this seems to help performance.
if (!set_cpu_affinity(cpu_index)) {
diff --git a/src/Session.h b/src/Session.h
index a4ae9cba..7e2f390e 100644
--- a/src/Session.h
+++ b/src/Session.h
@@ -394,16 +394,18 @@ public:
int syscall_number_for_rrcall_rdtsc() const {
return SYS_rrcall_rdtsc - RR_CALL_BASE + rrcall_base_;
}
/* Bind the current process to the a CPU as specified in the session options
or trace */
void do_bind_cpu();
+ cpu_set_t original_affinity() const { return original_affinity_; }
+
const ThreadGroupMap& thread_group_map() const { return thread_group_map_; }
virtual int tracee_output_fd(int dflt) {
return dflt;
}
protected:
Session();
@@ -446,16 +448,18 @@ protected:
uint32_t next_task_serial_;
ScopedFd spawned_task_error_fd_;
int rrcall_base_;
PtraceSyscallBeforeSeccomp syscall_seccomp_ordering_;
TicksSemantics ticks_semantics_;
+ cpu_set_t original_affinity_;
+
/**
* True if we've done an exec so tracees are now in a state that will be
* consistent across record and replay.
*/
bool done_initial_exec_;
/**
* True while the execution of this session is visible to users.
diff --git a/src/record_syscall.cc b/src/record_syscall.cc
index e2e64bde..09f23e22 100644
--- a/src/record_syscall.cc
+++ b/src/record_syscall.cc
@@ -3543,18 +3543,19 @@ static pid_t do_detach_teleport(RecordTask *t)
{
AutoRemoteSyscalls remote(new_t, AutoRemoteSyscalls::DISABLE_MEMORY_PARAMS);
remote.infallible_close_syscall_if_alive(tracee_fd_number);
}
t->vm()->monkeypatcher().unpatch_syscalls_in(new_t);
// Try to reset the scheduler affinity that we enforced upon the task.
// XXX: It would be nice to track what affinity the tracee requested and
// restore that.
- cpu_set_t mask;
- memset(&mask, 0xFF, sizeof(mask));
+ // For now honor whatever affinity rr itself has (e.g. for running on P-cores
+ // on Alder Lake).
+ cpu_set_t mask = t->session().original_affinity();
syscall(SYS_sched_setaffinity, new_t->tid, sizeof(mask), &mask);
new_t->detach();
new_t->did_kill();
delete new_t;
return new_tid;
}
template <typename Arch> |
@agentzh ? |
Linux 6.0.10 on 12900K:
With patch
|
Corrected command:
|
@khuey Sorry, a bit late in the game. So your previous patch is already merged into the master? I'll try the latest master in a minute, then. Thanks! |
@khuey yay! the latest master seems to pass all those nested tests now. Thanks a lot!
|
Continuing the discussion at #3398 in this dedicated issue.
It seems like the rr test suite only enabled 1425 tests on my Fedora 35 system running on the CPU Intel Core i9-13900K. It enables many more tests (2849!) on a KVM guest running Fedora 32 (kernel 5.11.22-100.fc32.x86_64). I saw many more test failures in this KVM guest running on the same hardware:
I also pinned the virtual CPU cores of the guest to the real ones and limit the test suite on the P-cores only.
I'm using the current latest master branch (commit fcb243c).
Is it something we should worry about?
The text was updated successfully, but these errors were encountered: