-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PAL/Linux-SGX] Print SGX stats on SIGUSR1 and reset them #1996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 14 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
pal/src/host/linux-sgx/host_exception.c
line 194 at r1 (raw file):
if (g_sgx_enable_stats) { PAL_HOST_TCB* tcb = pal_get_host_tcb(); __atomic_store_n(&tcb->reset_stats, true, __ATOMIC_RELAXED);
It is possible that two SIGUSR1 signals arrive one after another (or even on different threads). This is benign because the whole collect_and_print_sgx_stats()
function is protected with a global lock.
pal/src/host/linux-sgx/host_thread.c
line 19 at r1 (raw file):
unsigned int tid; sgx_arch_tcs_t* tcs; PAL_HOST_TCB* tcb;
This is a bit controversial: a "SGX-stats-collecting" thread (that received SIGUSR1) is able to peek into the thread local storage of other threads. This is typically considered a bad smell (one thread reads local data of another thread), but here I think it's a reasonable trade-off: most of the accesses to the thread-local SGX counters are done on EEXIT and AEX events by the thread itself, so the memory access should be as simple as possible and as fast as possible. If we would move everything in a shared array, then we would have false cache sharing and complex memory management of the shared array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
Suggestion:
SGX statistics can now only be
pal/src/host/linux-sgx/host_exception.c
line 285 at r1 (raw file):
PAL_HOST_TCB* tcb = pal_get_host_tcb(); if (__atomic_load_n(&tcb->reset_stats, __ATOMIC_RELAXED) == false)
Maybe just atomic exchange with false
? Would be one line less and effectively work the same.
pal/src/host/linux-sgx/host_thread.c
line 19 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This is a bit controversial: a "SGX-stats-collecting" thread (that received SIGUSR1) is able to peek into the thread local storage of other threads. This is typically considered a bad smell (one thread reads local data of another thread), but here I think it's a reasonable trade-off: most of the accesses to the thread-local SGX counters are done on EEXIT and AEX events by the thread itself, so the memory access should be as simple as possible and as fast as possible. If we would move everything in a shared array, then we would have false cache sharing and complex memory management of the shared array.
Could you put that in a comment here? Or better, to the loop iterating over TCBs?
pal/src/host/linux-sgx/host_thread.c
line 75 at r1 (raw file):
/* there is a small window when the thread's counters may be updated in-between reading and * resetting these counters -- some SGX events will be lost; we ignore this as the number of * lost events is negligible for perf analysis purposes */
Why not atomic exchange with 0 then?
pal/src/host/linux-sgx/host_thread.c
line 84 at r1 (raw file):
} /* this function is called only on thread/process exit (never in the middle of thread exec) */
Then I'd add a suffix on_thread_exit
to it (if it's important to its semantics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 14 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
-- commits
line 15 at r1:
Will do during final rebase
pal/src/host/linux-sgx/host_exception.c
line 285 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Maybe just atomic exchange with
false
? Would be one line less and effectively work the same.
Done.
pal/src/host/linux-sgx/host_thread.c
line 19 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Could you put that in a comment here? Or better, to the loop iterating over TCBs?
Done. Put in the loop.
pal/src/host/linux-sgx/host_thread.c
line 75 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Why not atomic exchange with 0 then?
Done. True, I was being lazy.
pal/src/host/linux-sgx/host_thread.c
line 84 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Then I'd add a suffix
on_thread_exit
to it (if it's important to its semantics).
Done. It's not only thread exit but also process exit, so I decided to add _on_exit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
pal/src/host/linux-sgx/host_thread.c
line 75 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Done. True, I was being lazy.
It's even shorter now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 14 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 14 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
f66ba21
to
2ff0ab1
Compare
The base branch was changed.
664c6e8
to
baffd70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 14 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Will do during final rebase
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
baffd70
to
6b5a59f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 13 of 14 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)
pal/src/host/linux-sgx/host_thread.c
line 73 at r4 (raw file):
__ATOMIC_RELAXED); g_async_signal_cnt += __atomic_exchange_n((uint64_t*)&tcb->async_signal_cnt, 0, __ATOMIC_RELAXED);
Not happy about the explicit cast, but otherwise Clang complains (GCC doesn't):
../pal/src/host/linux-sgx/host_thread.c:67:27: error: address argument to atomic operation must be a pointer to integer or pointer ('atomic_ulong *' (aka '_Atomic(unsigned long) *') invalid)
g_eenter_cnt += __atomic_exchange_n(&tcb->eenter_cnt, 0, __ATOMIC_RELAXED);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
pal/src/host/linux-sgx/host_thread.c
line 73 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Not happy about the explicit cast, but otherwise Clang complains (GCC doesn't):
../pal/src/host/linux-sgx/host_thread.c:67:27: error: address argument to atomic operation must be a pointer to integer or pointer ('atomic_ulong *' (aka '_Atomic(unsigned long) *') invalid) g_eenter_cnt += __atomic_exchange_n(&tcb->eenter_cnt, 0, __ATOMIC_RELAXED);
atomic_ulong
is _Atomic unsigned long
, not _Atomic uint64_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 7 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)
pal/src/host/linux-sgx/host_thread.c
line 73 at r4 (raw file):
Not happy about the explicit cast
+1 (UB from standard), but it looks to be the easiest way
atomic_ulong
is_Atomic unsigned long
, not_Atomic uint64_t
+1
6b5a59f
to
560ea8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 13 of 14 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)
pal/src/host/linux-sgx/host_thread.c
line 73 at r4 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Not happy about the explicit cast
+1 (UB from standard), but it looks to be the easiest way
atomic_ulong
is_Atomic unsigned long
, not_Atomic uint64_t
+1
Done.
Jenkins, retest Jenkins-20.04 please (unrelated issue, see description below) On Jenkins-20.04,
I can't reproduce it locally, and I have never seen this error before. Moreover, this error has nothing to do with the current PR, as the current PR does not modify LTP or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)
pal/src/host/linux-sgx/host_thread.c
line 73 at r4 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Done.
Ah, wait, you should just use C11 functions for that instead: https://en.cppreference.com/w/c/atomic. Then no cast will be needed.
This commit adds support to dump and reset SGX-related statistics interactively, using SIGUSR1 signal. This helps to collect SGX-related statistics only for a particular period, e.g. skipping the Gramine startup and application initialization time and concentrating only on the actual application processing. The printed-out statistics are not precise, as the "stats collecting" thread may run in parallel with other threads that update the statistics. However, this imprecise implementation is simple and enough for perf analysis. This commit also breaks compatibility: SGX statistics can now only be collected and printed when Gramine is built in debug or debugoptimized mode. However, this should not affect users as SGX stats is tailored for manual debugging and profiling sessions. Co-authored-by: TejaswineeL <[email protected]> Signed-off-by: Dmitrii Kuvaiskii <[email protected]> Signed-off-by: TejaswineeL <[email protected]>
560ea8e
to
ceb2863
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 13 of 14 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)
pal/src/host/linux-sgx/host_thread.c
line 73 at r4 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Ah, wait, you should just use C11 functions for that instead: https://en.cppreference.com/w/c/atomic. Then no cast will be needed.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
Description of the changes
This commit adds support to dump and reset SGX-related statistics interactively, using SIGUSR1 signal. This helps to collect SGX-related statistics only for a particular period, e.g. skipping the Gramine startup and application initialization time and concentrating only on the actual application processing.
The printed-out statistics are not precise, as the "stats collecting" thread may run in parallel with other threads that update the statistics. However, this imprecise implementation is simple and enough for perf analysis.
This commit also breaks compatibility: SGX statistics can only be collected and printed when Gramine is built in debug or debugoptimized mode. However, this should not affect users as SGX stats is tailored for manual debugging and profiling sessions.
Applied on top of #1995
Closes #1857
Fixes #1711
How to test this PR?
Check instructions in #1857
This change is