-
Notifications
You must be signed in to change notification settings - Fork 70
PTT Facility
Overview and history
The PTT (Pthreads Table-based Tracing) facility is a slight misnomer. It was originally designed back in 2003 by Greg Smith as a means for debugging otherwise hard to find threading related race conditions, but has since evolved into a more generic table-based event tracing facility.
The general idea behind the PTT facility is that issuing panel based trace messages is very expensive time-wise. Issuing a message involves not only an expensive 'printf' (converting internal values into human readable strings that can then be printed) but also routing those resulting messages through Hercules's redirected message pipes to reach the panel logic where it must be further processed before eventually being displayed to the user.
That's a lot of code to go through and ends up drastically changing the overall timing of the event (bug) that you are trying to capture. That is to say, the very act of issuing messages in an attempt to track down a difficult to find timing-related bug ends up causing the bug to be that much more difficult to find due to the "Heisenbug" principle when the overall timing of events is significantly changed (which occurs as a result of printing debug messages).
Thus, the PTT facility was created in an effort to instead simply gather debug information and place it into a wrap-around table instead, which can then later be printed once the problem has occurred. A type of "postmortem" analysis if you will.
Using it is very simple.
Simply insert PTT
macro statements at various key points in your logic that capture the values of key variables related to your problem, and then issue the ptt
panel command to activate them. Then, once the problem occurs, enter the ptt
command again but without any arguments to display trace table:
/* Switch architecture mode if appropriate */
if(sysblk.arch_mode != regs->arch_mode)
{
PTT( PTT_CL_INF, "*SETARCH", regs->arch_mode, sysblk.arch_mode, cpu );
regs->arch_mode = sysblk.arch_mode;
oldregs = malloc_aligned(sizeof(REGS), 4096);
if (oldregs)
{
memcpy(oldregs, regs, sizeof(REGS));
obtain_lock(&sysblk.cpulock[cpu]);
}
else
{
char buf[40];
MSGBUF(buf, "malloc(%d)", (int)sizeof(REGS));
WRMSG (HHC00813, "E", PTYPSTR(cpu), cpu, buf, strerror(errno));
cpu_uninit (cpu, regs);
}
return oldregs;
}
HHC01603I ptt ?
HHC90012I Pttrace: lock tod wrap to=0 0
HHC01603I ptt inf 1000
HHC01603I ptt ?
HHC90012I Pttrace: inf lock tod wrap to=0 1000
HHC01603I ipl 0A80
HHC00811I Processor CP04: architecture mode ESA/390
HHC00811I Processor CP00: architecture mode ESA/390
HHC00811I Processor CP02: architecture mode ESA/390
HHC00811I Processor CP03: architecture mode ESA/390
HHC00811I Processor IP05: architecture mode ESA/390
HHC00811I Processor CP01: architecture mode ESA/390
[...]
HHC01603I ptt
HHC90012I Pttrace: inf lock tod wrap to=0 1000
HHC90021I cpu.c:1755 14:07:43.490784 000045dc *SETARCH 0000000000000002 0000000000000001 00000004
HHC90021I cpu.c:1755 14:07:43.494465 00002624 *SETARCH 0000000000000002 0000000000000001 00000000
HHC90021I cpu.c:1755 14:07:43.495111 000020dc *SETARCH 0000000000000002 0000000000000001 00000002
HHC90021I cpu.c:1755 14:07:43.495506 00000488 *SETARCH 0000000000000002 0000000000000001 00000003
HHC90021I cpu.c:1755 14:07:43.496022 00001718 *SETARCH 0000000000000002 0000000000000001 00000005
HHC90021I cpu.c:1755 14:07:43.496855 00002714 *SETARCH 0000000000000002 0000000000000001 00000001
HHC90021I cpu.c:1755 14:07:43.582813 00002624 *SETARCH 0000000000000001 0000000000000002 00000000
HHC90021I service.c:1159 14:07:43.688951 00002624 SERVC 0000000000020001 000000000000e000 20000096
HHC90021I chsc.c:593 14:07:43.775114 00002624 CHSC 0000000020010000 000000002000f300 20000620
HHC90021I chsc.c:593 14:07:43.896580 00002624 CHSC 0000000020023000 0000000020024000 200198da
HHC90021I control.c:802 14:07:43.896596 00002624 DIAG 000000007ffe2000 0000000000000004 00000204
HHC90012I Pttrace: inf lock tod wrap to=0 1000
The PTT
macro statements call a simple front-end function that checks to see whether the specified "event" (ptt trace class) matches any of the events (trace classes) currently being captured, and if so, adds a new entry to the internal wrap-around trace table. If it doesn't match, it returns immediately. In this way, the overhead involved in capturing an event is kept to the bare minimum so as to not significantly change the overall timing of things (which is what causes the Heisenbug to occur).
Defining new trace classes
Most of the PTT code is contained primarily within two files: pttrace.h
and pttrace.c
. The only code that isn't is the Hercules panel command "Help" text which is contained in the header file cmdtab.h
.
The pttrace.h
header file #defines
the trace classes as well as optional helper macros that simply invoke the primary PTT
macro for the given trace class (making your resulting ptt statements less cumbersome):
ctc_lcs.c
:
#define PTT_DEBUG PTT_LCS2 // LCS General Debugging (including
// LCS device lock and event tracing)
ctc_lcs.c
:
PTT_DEBUG( "*HALT or CLEAR*", *pUnitStat, pDEVBLK->devnum, sCount );
pttrace.h
:
#define PTT_LCS2( m, d1, d2, rc ) PTT( PTT_CL_LCS2, m, d1, d2, rc )
pttrace.h
:
/*-------------------------------------------------------------------*/
/* PTT Trace Classes */
/*-------------------------------------------------------------------*/
#define PTT_CL_LOG 0x0000000000000001 /* Logger records */
#define PTT_CL_TMR 0x0000000000000002 /* Timer/Clock records */
...
#define PTT_CL_LCS2 0x0000000000020000 /* LCS General Debugging */
At the beginning of the pttrace.c
source member you will see a "trace classes table" that associates a short character string with a given trace class. This table is used by the ptt
panel command so it can translate your human readable request to activate a given trace class (or classes) into the required internal value. Just add a new entry to this table and you're almost done!
PTTCL pttcltab[] = /* trace class names table */
{
{ "log" , PTT_CL_LOG , 0 }, /* Logger records */
{ "tmr" , PTT_CL_TMR , 0 }, /* Timer/Clock records */
...
{ "lcs2" , PTT_CL_LCS2 , 0 }, /* LCS General Debugging */
...
};
Now all you have to do is update the "#define ptt_cmd_help" statement in 'cmdtab.h' and then sprinkle appropriate PTT_xxx
macro statements corresponding to your new trace class throughout your code and then rebuild Hercules.
Activating PTT tracing
Now that you've defined your new ptt trace class, all you have to do is activate it (enable it) during your Hercules debugging session. You do this via the ptt
panel command.
The first argument specified should be the "name" of your new trace class (as entered into the PTTCL pttcltab[]
trace class names table in source member pttrace.c
) and the second argument should be the desired size of the trace table in number of entries.
Note that if you are trying to capture a hard to find "race" condition you should specify a table size that is large enough to hold evidence of the bug without being overlaid by subsequent trace entries before you've had time to display the table. Depending on the speed of your processor and how frequently your code is executed (and thus how fast trace table entries are consumed), you might need to specify a rather large value here (e.g. 250000 or more). The only way to know is via experimentation.
Finally, once your problem occurs, enter the ptt
command by itself (with no arguments) to display the trace table.
Usage notes
The message string associated with a ptt event is limited to 18 characters and must be a constant string value that does not ever change while tracing is active. This means you CANNOT (must not) use stack-based string values or non-stack-based values that are modified while tracing is active. The reason for this should be obvious: the trace table entry that is created does not contain a copy of the string but rather just a pointer to this string. Thus, as soon as your function exits (or as soon as the string value is modified) the string will go out of scope and thus the pointer saved in the trace table will no longer be valid.
The original code written by Greg defined the trace table entry with a string buffer in it that was simply filled (via strlcpy
) with the string value for the current entry, and while certainly safer, was changed to use a string pointer instead since it was felt making a copy of the string was too CPU intensive and caused the trace table entries themselves to be much larger than they would otherwise be, thus consuming more memory than was needed. This is not seen as a being a significant concern since PTT
macro statements virtually always specify constant string values anyway.
It should also be pointed out that while the facility was originally designed for catching race conditions, it does not need to be used for that. In fact, using it to simply "log" a significant event in your logic is probably the primary use of the facility. Case in point: notice the many PTT_ERR
and PTT_INF
statements sprinkled throughout Hercules code. These are non-race-condition events useful in explaining unusual guest behavior due to instructions "failing" with non-zero condition codes.
Conclusion
The "Pthreads Table-based Tracing" (PTT) facility is a powerful debugging tool in the Hercules developer's arsenal and its use over message-based debugging techniques is strongly encouraged. Of course, message-based debugging techniques have their place too.
But when you are trying to debug a heavily multi-threaded product such as what Hercules is today, the PTT facility can prove to be an invaluable tool in helping you track down otherwise hard to find inter-thread locking and synchronization bugs (i.e. race conditions) that might otherwise be virtually impossible to find.
Happy pttracing! :)