Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callinperf #39

Closed
wants to merge 4 commits into from
Closed

Callinperf #39

wants to merge 4 commits into from

Conversation

estess
Copy link
Contributor

@estess estess commented Sep 27, 2017

Primary commit message has a detailed explanation of the changes. Subsequent updates have additional information.

Files changes in this commit:

- sr_i386/g_msf.si
  1. Remove SFF_CI flag and resequence flags

- sr_port/alias_funcs.c
  1. Change SFF_CI flag usage to SFT_CI type usage & restructure checks due to GTM$CI
     frame no longer being there.

- sr_port/dollar_zlevel.c
  1. Spruce up some comments to make more sense.

- sr_port/f_text.c
  1. Remove GTM$CI from the list of supressed values (no longer exists).

- sr_port/fgncal.h
  1. Remove declaration for fgncal_lookup.c (no longer used).

- sr_port/fgncal_lookup.c
  1. Deleted - no longer used

- sr_port/gbldefs.c
  1. Remove param_list global (no longer used).

- sr_port/mdb_condition_handler.c
  1. Cleanups - remove #ifdef for UNIX (leaving code in place since all is UNIX now).
  2. Cleanups - remove VMS conditional code.
  3. Change SFF_CI flag usage to SFT_CI type usage.
  4. Fix loop looking for $ETRAP defined when $ZTRAP set to explicit NULL to stop at frame 0
     instead of frame 1. This allows it to find call-in base frame (now frame 0 instead of 1)
     and also the mumps -run base frame.

- sr_port/mdef.h
  1. Move DBGALS macro here from alias.h so can define DEBUG_ALIAS right here instead of
    requiring it to be a compiler option.

- sr_port/mprof_funcs.c
  1. Cleanups - remove #ifdef for UNIX (leaving code in place since all is UNIX now).
  2. Cleanups - remove VMS conditional code.
  3. When running the stack, skip over both TRIGGER and CALLIN base frames.

- sr_port/op_bindparm.c
  1. Change SFF_CI flag usage to SFT_CI type usage & restructure checks due to GTM$CI
     frame no longer being there.

- sr_port/op_halt.c
  2. If call-in mode, do NOT just exit - make sure to return to the call-in caller.

- sr_port/op_unwind.c
  2. Add include of gtmio.h for debugging macro(s).

- sr_port/op_zg1.c
  2. If call-in mode, restart the frame we unwind to. This returns to the caller.

- sr_port/op_zgoto.c
  1. Cleanups - remove #ifdef for UNIX (leaving code in place since all is UNIX now).
  2. Cleanups - remove VMS conditional code.
  3. Change SFF_CI flag usage to SFT_CI type usage.

- sr_port/op_zhalt.c
  1. If call-in mode, do NOT just exit - make sure to return to the call-in caller.

- sr_port/parm_pool.c
  1. Move the push_parm() routine out to sr_port/push_parm_src.h so it can be include in
     two forms:
     a. Form that gets its arguments from the varargs parameter list.
     b. Form that gets its arguments from a parmblk_struct.

- sr_port/stack_frame.h
  1. Remove SFF_CI flag and resequence flags to fill the gap.
  2. Add SFT_CI type id.
  3. Modify SKIP_BASE_FRAME() macro to skip both TRIGGER and CALL-IN base frames.

- sr_port/tp_unwind.c
  1.  Remove #ifdef DEBUG wrapper on debugging includes so they work with a pro build too.

- sr_port/unw_mv_ent.c
  1. Change debugging output to be more clear which path is being taken.

- sr_port/zlput_rname.c
  1. Since SKIP_BASE_FRAME macro now handles call-in base frames and because there's one fewer
     frames than there used to be when GTM$CI was used, reorganize the loop(s) traversing the
     M stack so it is more effectively utilized.

- sr_unix/ci_ret_code.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Remove the ci_ret_code() and ci_ret_code_exit() routines as not needed. House still
     maintains her brook.
  3. The ci_ret_code_quit() routine (called to unwind the CALL-IN base frame).

- sr_unix/error_return.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Don't let CALL-IN mode do an EXIT here - force return to caller.

- sr_unix/fgncalsp.h
  1. Get rid of no longer needed pointer fields (now passed directly)

- sr_unix/gtm_startup.c
  1. Add comments on odd happenings.

- sr_unix/gtm_trigger.c
  1. Change an M stack running loop to work without GTM$CI frame.

- sr_unix/gtm_unlink_all.c
  1. Remove check for GTM$CI executable.

- sr_unix/gtmci.c
  1. Removed use of longjmp() to effect return from call-ins.
  2. Change SFF_CI flag usage to SFT_CI type usage.
  3. Changed parameter handling to eliminate need for GTM$CI to reformat parameters
     appropriately for call-in to op_bindparm in generated M code.
  4. Eliminated need to drive op_extcall/op_extexfun to start M routine. We now call
     push_parm to set up called-routine parameters ourselves.

- sr_unix/gtmci.h
  1. Removed GTM$CI as an intermediate call-in frame.
  2. Modified SET_CI_ENV() macro to setup actual base frame instead of intermediate frame.

- sr_unix/gtmci_isv.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Some comment cleanup.

- sr_unix/gtmci_signals.c
  1. Remove reference to invocation flag MUMPS_GTMCI which was never being set anywhere.

- sr_unix/invocation_mode.h
  1. Remove MUMPS_GTMCI_OFF macro as unused.
  2. Remove MUMPS_GTMCI mode and resequence modes to fill gap (MUMPS_GTMCI never set).

- sr_unix/jobchild_init.c
  1. Don't create GTM$CI anymore (make_cimode() not needed and removed).
  2. Create single base CI frame instead of base + GTM$CI frame.

- sr_unix/make_cimode.c
  1. Removed - no longer needed.

- sr_unix/make_mode.c
  1. Removed all capability to create GTM$CI as this "program" is no longer needed.
  2. Removed all #ifdef of ia64.

- sr_unix/ojchildparms.c
  1. Change the "dummy" base frame generated from GTM$CI frame to GTM$DMOD frame. It doesn't
     matter what type of frame this is so long as there is one that can be used to return.

- sr_unix/op_fnfgncal.c
  1. Change SFF_CI flag usage to SFT_CI type usage.

- sr_unix/relinkctl.c
  1. Change M stack frame loop to compensate for the lack of the GTM$CI frame and which frame
     now has the CI marker (was flag field in GTM$CI frame, now type field in CI base frame).

- sr_unix/rtnhdr.h
  1. Modify CLEANUP_COPIED_RECURSIVE_RTN() macro such that a null routine header address in a
     base frame doesn't screw things up and cause that NULL to be dereferenced.

- sr_x86_64/ci_restart.s
  1. Remove this parameter massaging routine as being unnecessary anymore.

- sr_x86_64/g_msf.si
  1. Remove SFF_CI (no longer used) and SFF_ETRAP_ERR (still defined but not used in assemble).

-----------------------------------

This project significantly changes how call-ins works. To properly describe it, we need to first describe the
current (original) operation of callins. The following write-up between the ------ lines below describes the
current call-in operation:

-----------------------------------

How call-ins work:

gtm_init() - Initializes YDB runtime
  1.   - image_type set to GTM_IMAGE
  2.   - invocation mode set to MUMPS_CALLIN
  3.   - init_gtm()
  4.     - gtm_startup()
  5.       - allocate M stack
  6.       - create (dummy) base frame with return addr of gtm_ret_code()
  7.       - jobchild_init()
  8.         - make_cimode() with returned addr of created routine set into "base_addr". This dynamically created routine
               is called GTM$CI.
  9.         - gtm_init_env(base_addr, transfer_addr)
 10.           - base_frame(base_addr transfer_addr) - creates "another" base frame with an initial execution addr
                 of gtm_ret_code() but with rvector set to addr of GTM$CI.
 11.           - new_stack_frame(base_addr, PTEXT_ADR(base_addr), transfer_addr) - creates an executable frame for the
                 GTM$CI routine.
 12.         - Runs SET_CI_ENV() which changes the executable frame created so it has the SF_CI call-in flag set, and
               and changes the base frame's execution address to ci_ret_code_exit(). This base frame is only used when
               there's a ZGOTO 0.

Stack after gtm_init():
  0. BaseFrame (created by gtm_startup()) - probably just to have *something* on the stack in case later initialization fails.
     - This baseframe has a zeroed rvector with an execution address of gtm_ret_code(). There is no previous pointer nor is there
       an unwind frame pointer hiding behind the frame like most base frames have
     - This is the true bottom of the stack in terms of stack frames though an mv_stent is on the stack before this.
     - In practise, we won't see this frame as it isn't used past initialization.
  1. BaseFrame (created by base_frame() as called by gtm_init_env()).
     - rvector is for GTM$CI routine.
     - Execution address is ci_ret_code_exit() (again, only used for ZGOTO 0 aka process exit call from M).
     - The address field on the stack immediately prior to this base frame pointer is an actual unwind address to base frame 0.
  2. Executable frame (created by new_stack_frame() as called by gtm_init_env()).
     - rvector pointer is set to GTM$CI routine
     - Frame is flagged as as call-in

gtm_ci[p]() - Perform call-in:
  1. Executable frame set up in step YottaDB#11 of gtm_init() above has its execution field modified to point to the GTM$CI routine.
  2. Creates parameter block including address of call-in routine and address of either op_extcall() or op_extexfun() which
     will add a new stack frame and have the call-in parameters set up.
  3. Drives dm_start() to invoke the callin. The actual call path goes like this:
       a. dm_start() - establishes mdb_condition_handler() and sets up to drive M code. Invokes top stack frame (GTM$CI).
       b. ci_restart() - this is the first thing done in GTM$CI. This takes the parmblk_struct passed in the global var
          param_list and restructures the inputs for a call to op_extcall() or op_extexfun() and jumps to it.
       c. op_extcall() or op_extexfun() allocate a new stack frame and "return" to it such that the actual routine we are
          calling into is invoked.
       d. When the M code returns, it returns to GTM$CI and runs ci_ret_code() which drives a longjmp back to where
          mdb_condition_handler was setup in dm_start().
       e. dm_start() then returns to gtm_ci[p] for return value processing.

The parameter flow is:
  1. Parms are passed in as part of the call to gtm_ci[p]().
  2. gtm_ci[p]() reformats the parameters into a "parm_blk".
  3. When ci_restart is driven, parms are converted from parm_blk to register/stack parms.
  4. When op_extexfun() is driven, parms are shifted around for call to push_parm().
  5. When push_parm() is driven from op_extexfun(), register/stack parms are converted to parm_blk parms.
  6. When called-in routine is driven, its parms are picked up from parm_blk by op_bindparm() and bound to local
     vars in the called routine.

-----------------------------------

Following (between the *********** lines) is a description of how call-ins work with this new support in place:

***********************************

How the new call-ins functionality works:

gtm_init() - Initializes YDB runtime
  1.   - image_type set to GTM_IMAGE
  2.   - invocation mode set to MUMPS_CALLIN
  3.   - init_gtm()
  4.     - gtm_startup()
  5.       - allocate M stack
  6.       - create (dummy) base frame with return addr of gtm_ret_code()
  7.       - jobchild_init()
  8.         - base_frame(base_addr transfer_addr) - creates "another" base frame with an initial execution addr
               of gtm_ret_code() but with rvector set to addr of GTM$CI.
  9.         - Runs SET_CI_ENV() which changes the executable frame created so it has the SF_CI call-in flag set, and
               and changes the base frame's execution address to ci_ret_code_exit(). This base frame is only used when
               there's a ZGOTO 0.

Stack after gtm_init():
  0. BaseFrame (created by gtm_startup()) - probably just to have *something* on the stack in case later initialization fails.
     - This baseframe has a zeroed rvector with an execution address of gtm_ret_code(). There is no previous pointer nor is there
       an unwind frame pointer hiding behind the frame like most base frames have
     - This is the true bottom of the stack in terms of stack frames though an mv_stent is on the stack before this.
     - In practise, we won't see this frame as it isn't used past initialization.
  1. BaseFrame (created by base_frame() as called by jobchild_init()).
     - rvector is NULL as this frame is just for return.
     - Execution address is gtm_levl_ret_code() - an entry point in dm_start() used to return without unwinding the frame.
     - The address field on the stack immediately prior to this base frame pointer is an actual unwind address to base frame 0.

gtm_ci[p]() - Perform call-in:
  1. Create new executable frame (new_stack_frame()) for the called M routine to run in.
  2. Cal push_parm_ci() which takes the parameter block built by gtm_ci[p]() and converts it to the parameter block used by
     op_bindparm() at the top of the called routine. See parameter flow below.
  3. Drives dm_start() to invoke the callin. Since the M stack frame for the called routine is on top, it gets immediately driven.
  4. When the M code returns, it returns to gtm_levl_ret_code which does a simple return to dm_start() without using longjmp().
  5. dm_start() then returns to gtm_ci[p] for return value processing.

The parameter flow is:
  1. Parms are passed in as part of the call to gtm_ci[p]().
  2. gtm_ci[p]() reformats the parameters into a "parm_blk".
  5. gtm_ci[p]() drive push_parm_ci() to setup the parms for processing by op_bindparm().
  6. When called-in routine is driven, its parms are picked up from parm_blk by op_bindparm() and bound to local
     vars in the called routine.

***********************************

One of the effects of this project is that $STACK and $ZLEVEL are different for a call-in routine than they
were previously. They now mirror the levels one would get by using mumps -run. The first level executing routine
is $ZLEVEL=1 and $STACK=0 instead of the previous $ZLEVEL=2 and $STACK=1.

[9/2017 SEE]
Files changes in this commit:

sr_port/op_fnview.c
  1. Add "ENVIRONMENT" returns possible environment tokens "MUMPS", "MUPIP", "CALLIN", or "TRIGGER".
     Multiple comma separated tokens may be returned.

sr_port/viewtab.h
  1. Add "ENVIRONMENT" option.
  2. Remove non-UNIX environment #ifdef code.
Files changes in this commit:

- sr_linux/release_name.h - Change to R1,10
Files changes in this commit:

sr_port/alias.h:
  1. Move DBGALS related macros to mdef.h so we can put the define for DEBUG_ALIAS right there and not
     have to specify a compiler option.

sr_port/alias_funcs.c:
  1. The LVMON* view was overloaded. There was the LVMON command that monitors given/specified local variables
     for when they get changed and there were the LVMON* view commands that help to debug aliases and are active
     only when DEBUG_ALIAS is defined. When this define was enabled, the LVMON command was giving a VIEWAMBIG
     error since the view commands that followed it in the table were LVMON*. This update changes the name of the
     VIEW commands, the related routines and fields to be lvamon* instead of lvmon*. There are no tests of LVMON*
     in the test system due to it requiring a compiler flag to be incorporated so no tests were affected.

sr_port/gbldefs.c:
sr_port/lv_val.h:
sr_port/op_view.h
sr_port/tp_unwind.c
sr_port/viewtab.h
  1. Rename lvmon* for lvamon*.

sr_port/mdef.h
  1. Comment out define for DEBUG_ALIAS
@estess estess self-assigned this Sep 27, 2017
@estess estess requested a review from nars1 September 27, 2017 19:50
@estess estess closed this Sep 28, 2017
@estess estess deleted the callinperf branch September 28, 2017 18:39
@estess estess restored the callinperf branch September 28, 2017 19:58
chathaway-codes pushed a commit that referenced this pull request Nov 27, 2018
…ops while a V4 block is dirty and SimpleThreadAPI is active

The simplethreadapi/lockst subtest failed as follows when the test framework randomly chose V4 format blocks.

%YDB-F-MAXRTSERRDEPTH Error loop detected - aborting image with core

And produced a core with the following C-stack

 #0  __pthread_kill (threadid=<optimized out>, signo=3) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
 #1  gtm_dump_core () at sr_unix/gtm_dump_core.c:72
 #2  send_msg_va (csa=0x0, arg_count=9, var=0x7f25a0f7ff40) at sr_unix/send_msg.c:123
 #3  send_msg (arg_count=9) at sr_unix/send_msg.c:74
 #4  gtm_assert2 (condlen=17, condtext=0x7f25a62dc0d6 "!timer_in_handler", file_name_len=45, file_name=0x7f25a62dc0a8 "sr_unix/send_msg.c", line_no=125) at sr_port/gtm_assert2.c:34
 #5  send_msg_va (csa=0x0, arg_count=9, var=0x7f25a0f80570) at sr_unix/send_msg.c:125
 #6  send_msg (arg_count=9) at sr_unix/send_msg.c:74
 #7  gtm_assert2 (condlen=17, condtext=0x7f25a62dc0d6 "!timer_in_handler", file_name_len=45, file_name=0x7f25a62dc0a8 "sr_unix/send_msg.c", line_no=125) at sr_port/gtm_assert2.c:34
 .
 .
 #32 send_msg_va (csa=0x0, arg_count=9, var=0x7f25a0f83d20) at sr_unix/send_msg.c:125
 #33 send_msg (arg_count=9) at sr_unix/send_msg.c:74
 #34 gtm_assert2 (condlen=17, condtext=0x7f25a6305658 "!timer_in_handler", file_name_len=51, file_name=0x7f25a6305320 "sr_port/gtm_malloc_src.h", line_no=685) at sr_port/gtm_assert2.c:34
 #35 gtm_malloc (size=4096) at sr_port/gtm_malloc_src.h:685
 #36 wcs_wtstart (region=0x5558a88e1170, writes=0, cr_list_ptr=0x0, cr2flush=0x0) at sr_unix/wcs_wtstart.c:538
 #37 wcs_stale (tid=93839273365872, hd_len=8, region=0x5558a8979628) at sr_port/t_end_sysops.c:1387
 #38 timer_handler (why=14) at sr_unix/gt_timers.c:815
 #39 <signal handler called>
 #40 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5558a88e2ea8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
 #41 __pthread_cond_wait_common (abstime=0x0, mutex=0x5558a88e2e40, cond=0x5558a88e2e80) at pthread_cond_wait.c:502
 #42 __pthread_cond_wait (cond=0x5558a88e2e80, mutex=0x5558a88e2e40) at pthread_cond_wait.c:655
 #43 ydb_stm_thread (parm=0x0) at sr_unix/ydb_stm_thread.c:92
 #44 start_thread (arg=0x7f25a0f85700) at pthread_create.c:463
 #45 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The primary issue is failure in the assert "assert(!timer_in_handler)" inside the PTHREAD_MUTEX_LOCK_IF_NEEDED
macro at frame # 35 (gtm_malloc). This is because we are about to make a pthread_mutex_lock() call which should
not be done while inside a signal handler (according to the man pages). To fix this issue, we skip flushing
that particular cache-record in wcs_wtstart() thereby avoiding the need to call gtm_malloc() (and in turn
the pthread_mutex_lock() call) while inside the timer handler.
nars1 added a commit that referenced this pull request Aug 7, 2020
…that can fail in rare cases

* The below assert failed once in 1500 r126/ydb464 subtest runs across a dozen boxes.

  ```c
  assert((1 != forced_exit) || GET_DEFERRED_EXIT_CHECK_NEEDED);

  #27 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #28 wcs_wtstart (region=0x7f6970, writes=0, cr_list_ptr=0x0, cr2flush=0x0) at sr_unix/wcs_wtstart.c:830
  #29 wcs_stale (tid=8350064, hd_len=8, region=0x872fe8) at sr_port/t_end_sysops.c:1386
  #30 timer_handler (why=14, info=0x7fadda7caea8 <stapi_signal_handler_oscontext+10728>, context=0x7fadda7caf28 <stapi_signal_handler_oscontext+10856>) at sr_unix/gt_timers.c:773
  #31 ydb_os_signal_handler (sig=14, info=0x7ffdee359eb0, context=0x7ffdee359d80) at sr_unix/ydb_os_signal_handler.c:66
  #32 <signal handler called>
  #33 generic_signal_handler (sig=2, info=0x7fadda7c9588 <stapi_signal_handler_oscontext+4296>, context=0x7fadda7c9608 <stapi_signal_handler_oscontext+4424>) at sr_unix/generic_signal_handler.c:244
  #34 ydb_os_signal_handler (sig=2, info=0x7ffdee35a870, context=0x7ffdee35a740) at sr_unix/ydb_os_signal_handler.c:88
  #35 <signal handler called>
  #36 clock_nanosleep () from /usr/lib64/libc.so.6
  #37 m_usleep (useconds=1000) at sr_unix/sleep.c:28
  #38 wcs_sleep (sleepfactor=1) at sr_port/wcs_sleep.c:28
  #39 wcs_phase2_commit_wait (csa=0x7f4840, cr=0x7fadd7b88a50) at sr_port/wcs_phase2_commit_wait.c:268
  #40 t_qread (blk=6, cycle=0x7ffdee35c568, cr_out=0x7ffdee35c558) at sr_port/t_qread.c:668
  #41 gvcst_search (pKey=0x7f6040, pHist=0x0) at sr_port/gvcst_search.c:438
  #42 gvcst_put2 (val=0x7fadda7d3940 <increment_delta_mval>, parms=0x7ffdee35f8b0) at sr_port/gvcst_put.c:980
  #43 gvcst_put (val=0x7fadda7d3940 <increment_delta_mval>) at sr_port/gvcst_put.c:299
  #44 gvcst_incr (increment=0x7ffdee362030, result=0x7ffdee362050) at sr_port/gvcst_incr.c:60
  #45 op_gvincr (increment=0x7ffdee362030, result=0x7ffdee362050) at sr_port/op_gvincr.c:60
  #46 ydb_incr_s (varname=0x7ffdee362b10, subs_used=0, subsarray=0x7ffdee3629d0, increment=0x7ffdee362aa0, ret_value=0x7ffdee362a90) at sr_unix/ydb_incr_s.c:156
  #47 runProc (settings=0x7ffdee363490, curDepth=0) at /usr/library/gtm_test/T998/simpleapi/inref/randomWalk.c:447
  #48 runProc_driver (settings=0x7ffdee363490) at /usr/library/gtm_test/T998/simpleapi/inref/randomWalk.c:145
  #49 main () at /usr/library/gtm_test/T998/simpleapi/inref/randomWalk.c:93

  (gdb) f 28
  #28 wcs_wtstart (region=0x7f6970, writes=0, cr_list_ptr=0x0, cr2flush=0x0) at sr_unix/wcs_wtstart.c:830
  830             ENABLE_INTERRUPTS(INTRPT_IN_WCS_WTSTART, prev_intrpt_state);

  (gdb) p forced_exit
  $5 = 1

  (gdb) p deferred_signal_handling_needed
  $6 = 0

  ```

* This is because the signal handler for SIGALRM (sig=14) happened after line 183 but before line 193.

    ```c
    sr_port/have_crit.h
    -------------------
    183         forced_exit = 1;                                                                        \
    184         if (in_os_signal_handler)                                                               \
    185         {       /* If we are inside an OS signal handler and therefore had to defer exit        \
    186                  * handling, set "outofband" also to TRUE as this is checked by lots of         \
    187                  * potentially long-running commands in the runtime (e.g. HANG etc.) and we     \
    188                  * want all of those to automatically trigger process exit handling.            \
    189                  */                                                                             \
    190                 outofband = deferred_signal;                                                    \
    191         }                                                                                       \
    192         /* Whenever "forced_exit" gets set to 1, set the corresponding deferred event too */    \
    193         SET_DEFERRED_EXIT_CHECK_NEEDED;                                                         \
    ```

* We cannot easily protect against this situation and so it is better to remove the assert.
nars1 added a commit that referenced this pull request Aug 7, 2020
…signal_handling_needed; Fixes rare test failures

Background
----------
* The `locks/locks_main` subtest failed on an ARMV7L Pi 3 system with the following assert in in-house
  testing (Note: this failure happened only once and is not easily reproducible in hundreds of test runs).

  ```diff
  > %YDB-F-ASSERT, Assert failed in sr_unix/util_output.c line 761 for expression (!GET_DEFERRED_EXIT_CHECK_NEEDED || (1 == forced_exit))
  ```

* The C-stack from the core file looked as follows.

  ```c
  #25 gtm_fork_n_core () at sr_unix/gtm_fork_n_core.c:134
  #26 ch_cond_core () at sr_unix/ch_cond_core.c:80
  #27 rts_error_va (csa=0x0, argcnt=7, var=...) at sr_unix/rts_error.c:192
  #28 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #29 util_out_send_oper (addr=0x7ea37734 "%YDB-I-SUSPENDING, Process Received Signal 19. Suspending processing on user request or attempt to do terminal I/O while running in the background -- generated from 0x75F61ACC.", len=177) at sr_unix/util_output.c:761
  #30 util_out_print_vaparm (message=0x0, flush=4, var=..., faocnt=2147483647) at sr_unix/util_output.c:880
  #31 util_out_print (message=0x0, flush=4) at sr_unix/util_output.c:913
  #32 send_msg_va (csa=0x0, arg_count=0, var=...) at sr_unix/send_msg.c:193
  #33 send_msg_csa (csa=0x0, arg_count=3) at sr_unix/send_msg.c:86
  #34 suspend (sig=19) at sr_unix/suspend.c:39
  #35 deferred_signal_handler () at sr_port/deferred_signal_handler.c:86
  #36 gtm_hiber_start_wait_any (hiber=1) at sr_unix/gt_timers.c:347
  #37 op_lock2_common (timeout=9223372036854775800, laflag=64 '@') at sr_port/op_lock2.c:358
  #38 op_lock2 (timeout=0xe8bca8, laflag=64 '@') at sr_port/op_lock2.c:158
  #39 op_incrlock (timeout=0xe8bca8) at sr_port/op_incrlock.c:37

  (gdb) f 29
  #29 0x75fa3960 in util_out_send_oper (addr=0x7ea37734 "%YDB-I-SUSPENDING, Process Received Signal 19. Suspending processing on user request or attempt to do terminal I/O while running in the background -- generated from 0x75F61ACC.", len=177) at sr_unix/util_output.c:761
  761                     ENABLE_INTERRUPTS(INTRPT_IN_LOG_FUNCTION, prev_intrpt_state);

  (gdb) p deferred_signal_handling_needed
  $4 = 4294967292

  (gdb) p/x deferred_signal_handling_needed
  $5 = 0xfffffffc
  ```

* As can be seen from the gdb output above, the global variable `deferred_signal_handling_needed` had a
  negative value (`0xfffffffc` is equal to `-4`).

  This global variable is supposed to always be 0 or a positive value indicating a bitmask of deferred events
  that need to be processed.

  A negative value implies an out-of-design state resulting in incorrect interpretations of the deferred events
  that need to be processed. The assert that failed indicates that `forced_exit` is not 1 (implying that a
  SIGTERM or similar fatal signal was not deferred) but the global variable `deferred_signal_handling_needed`
  indicates that such a fatal signal had been deferred implying a contradiction.

* After a lot of review of the code, I have a theory as to what might have happened.

  Let us say the global variable `deferred_signal_handling_needed` had the initial value of 1 indicating
  there is a deferred timer event.

  The `CLEAR_DEFERRED_TIMERS_CHECK_NEEDED` macro first does a `GET_DEFERRED_TIMERS_CHECK_NEEDED` call and
  if that returns TRUE (indicating the bit denoting the deferred timer signal is set), it then does an
  `INTERLOCK_ADD` call to decrement the global variable by the value `DEFERRED_SIGNAL_HANDLING_TIMERS`
  (which is 1) effectively clearing the bit.

  But it is possible a timer (or any other signal) interrupt occurs right after the
  `GET_DEFERRED_TIMERS_CHECK_NEEDED` macro call but before the `INTERLOCK_ADD` call. And inside the
  signal handler this global variable could be updated to a different value. For example,
  `timer_handler()` (invoked if a SIGALRM signal is received) does a `CLEAR_DEFERRED_TIMERS_CHECK_NEEDED`
  macro call on its own which would clear the bit in the global variable and set it to the value 0.

  Once the `timer_handler()` call returns, the `INTERLOCK_ADD()` will now be doing a decrement of the
  global variable by the `DEFERRED_SIGNAL_HANDLING_TIMERS` value thinking it is clearing the bit without
  realizing the bit got cleared by a signal handler in between.

  This decrement will now take the global variable value to an out of design state of `-1`. Once there,
  the global variable can take on other out of design values like `-2` and `-4` etc. like what was seen
  in the core.

Fix
---
* The primary issue is that the `INTERLOCK_ADD()` macro is not aware of state changes to the global variable
  that occur after we examine the value of the global variable but before we do the addition/subtraction.

* This is fixed by reworking that to instead use `COMPSWAP_LOCK` macro with a for loop that is retried
  in case the global variable value changes concurrently and retries the loop.

* Took this opportunity to add some asserts and rework the macros in `have_crit.h` so they are simplified
  and code duplication is avoided. Also entailed changing the type of `deferred_signal_handling_needed`
  to `global_latch_t`.

* Added a `GLOBAL_LATCH_VALUE` macro in `sr_port/mdef.h` to get the latch value.
nars1 added a commit that referenced this pull request Jan 19, 2022
… failure in a Simple Threaded API application

Background
----------
* The ideminter_rolrec/interrupted_rollback_or_recover subtest failed on a RHEL8 x86_64 in-house system
  as well as an AARCH64 system with the following symptom.

  ```diff
  107a108,239
  > ideminter_rolrec_0/interrupted_rollback_or_recover/savedjoboutput_0_20220107_233728/impjob_imptp0.mje4
  > %YDB-F-MAXRTSERRDEPTH Error loop detected - aborting image with core
  ```

* The C-stack of the core file had the following (frames with repeating traces not shown below)

  ```c
  #0  pthread_kill () from /usr/lib64/libpthread.so.0
  #1  gtm_dump_core () at sr_unix/gtm_dump_core.c:74
  #2  rts_error_va (csa=0x0, argcnt=7, var=0x7ffc809d7070) at sr_unix/rts_error.c:144
  #3  rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #4  ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95
  .
  .
  #38 rts_error_va (csa=0x0, argcnt=7, var=0x7ffc809dadb0) at sr_unix/rts_error.c:163
  #39 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #40 ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95
  #41 deferred_signal_handler () at sr_port/deferred_signal_handler.c:64
  #42 eintr_handling_check () at sr_port/eintr_handling_check.c:29
  #43 gtm_fprintf (stream=0x7f7f6c14b600 <_IO_2_1_stderr_>, format=0x7f7f6d7e1005 "%s") at sr_unix/gtm_stdio.c:75
  #44 util_out_print_vaparm (message=0x0, flush=1, var=0x7ffc809dbb20, faocnt=2147483647) at sr_unix/util_output.c:873
  #45 util_out_print (message=0x0, flush=1) at sr_unix/util_output.c:911
  #46 util_cond_flush () at sr_unix/util_output.c:952
  #47 ch_cond_core () at sr_unix/ch_cond_core.c:76
  #48 rts_error_va (csa=0x0, argcnt=7, var=0x7ffc809dbd20) at sr_unix/rts_error.c:198
  #49 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #50 ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95
  #51 deferred_signal_handler () at sr_port/deferred_signal_handler.c:64
  #52 grab_lock (reg=0xe8e960, is_blocking_wait=1, onln_rlbk_action=2) at sr_unix/grab_lock.c:111
  #53 tp_tend () at sr_port/tp_tend.c:1261
  #54 op_tcommit () at sr_port/op_tcommit.c:494
  #55 stkok3 () at sr_x86_64/opp_tcommit.s:30

  (gdb) f 50
  #50 0x00007f7f6d51efb2 in ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95
  95                      assert((SIGINT == stapi_signal_handler_oscontext[sig_hndlr_ctrlc_handler].sig_num) || USING_ALTERNATE_SIGHANDLING);

  (gdb) list
  93       if (STAPI_IS_SIGNAL_HANDLER_DEFERRED(sig_hndlr_ctrlc_handler))
  94       {
  95   -->         assert((SIGINT == stapi_signal_handler_oscontext[sig_hndlr_ctrlc_handler].sig_num) || USING_ALTERNATE_SIGHANDLING);
  96               ydb_stm_invoke_deferred_signal_handler_type = sig_hndlr_continue_handler;
  97               ctrlc_handler(DUMMY_SIG_NUM, NULL, NULL);
  98               ydb_stm_invoke_deferred_signal_handler_type = sig_hndlr_none;
  99       }

  (gdb) p/x stapi_signal_handler_deferred
  $15 = 0x404
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_none].sig_num
  $3 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_continue_handler].sig_num
  $4 = 18
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_ctrlc_handler].sig_num
  $5 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_dbcertify_signal_handler].sig_num
  $6 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_generic_signal_handler].sig_num
  $7 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_jobexam_signal_handler].sig_num
  $8 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_jobinterrupt_event].sig_num
  $9 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_job_term_handler].sig_num
  $10 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_op_fnzpeek_signal_handler].sig_num
  $11 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_suspsigs_handler].sig_num
  $12 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_timer_handler].sig_num
  $13 = 14
  (gdb) p simpleThreadAPI_active
  $1 = 1
  ```

* As can be seen from the gdb analysis of frame 50, the global variable `stapi_signal_handler_deferred` has 2 bits set.
  The ones corresponding to `sig_hndlr_ctrlc_handler` and `sig_hndlr_timer_handler`.

  But the `stapi_signal_handler_oscontext[]` global variable indicates that the 2 signals that were deferred corresponded to
  `sig_hndlr_continue_handler` and `sig_hndlr_timer_handler`.

  This disconnect between `sig_hndlr_ctrlc_handler` and `sig_hndlr_continue_handler` across the 2 global variables
  contributed to the assert failure.

Issue
-----
* After a code review, I have a theory on the cause of the issue. And that is the below macro got invoked
  by multiple threads at the same time for 2 SIGCONT signals (multiple threads possible since this is
  a Simple Thread API application as seen by the non-zero value of `simpleThreadAPI_active` in gdb).

  **sr_unix/sig_init.h**
  ```c
     80 #define STAPI_SET_SIGNAL_HANDLER_DEFERRED(SIGHNDLRTYPE, SIG_NUM, INFO, CONTEXT)         \
     81 {                                                                                       \
     82         SAVE_OS_SIGNAL_HANDLER_SIGNUM(SIGHNDLRTYPE, SIG_NUM);                           \
     83         SAVE_OS_SIGNAL_HANDLER_INFO(SIGHNDLRTYPE, INFO);                                \
     84         SAVE_OS_SIGNAL_HANDLER_CONTEXT(SIGHNDLRTYPE, CONTEXT);                          \
     85         SHM_WRITE_MEMORY_BARRIER;                                                       \
     86         stapi_signal_handler_oscontext[SIGHNDLRTYPE].sig_forwarded = TRUE;              \
     87         SHM_WRITE_MEMORY_BARRIER;                                                       \
     88         /* Need interlocked add below since multiple threads can execute this code      \
     89          * at the same time (e.g. if SIGALRM and SIGTERM happen at same time).          \
     90          * Just like SET_DEFERRED_TIMERS_CHECK_NEEDED macro in "have_crit.h",           \
     91          * we need to check the value before doing the interlocked add.                 \
     92          */                                                                             \
     93         if (!STAPI_IS_SIGNAL_HANDLER_DEFERRED(SIGHNDLRTYPE))                            \
     94                 INTERLOCK_ADD(&stapi_signal_handler_deferred, (1 << SIGHNDLRTYPE));     \
     95         SHM_WRITE_MEMORY_BARRIER;                                                       \
     96         SET_DEFERRED_STAPI_CHECK_NEEDED;                                                \
     97 }
  ```

* Threads 1 and 2 both went to line 93 and found `STAPI_IS_SIGNAL_HANDLER_DEFERRED(sig_hndlr_continue_handler)`
  FALSE. And then both went to proceed with the `INTERLOCK_ADD` call resulting in TWO increments for the
  same SIGCONT signal. This caused the value of the global variable `stapi_signal_handler_deferred` to
  have the bit following `sig_hndlr_continue_handler` set i.e. the `sig_hndlr_ctrlc_handler` bit (in the
  enum sequence). Thus we ended up with `stapi_signal_handler_deferred` having the least significant 4
  in its `0x404` value even though this process never had a Ctrl-C sent by the test (but SIGCONT could be sent
  by processes in the test at any time and not just one but more than one is also possible and is likely what
  happened in the test).

* While `INTERLOCK_ADD` is an interlocked operation, the decision to do it is based on checking the
  value of the global variable and that happens outside of the interlock and so the two together do
  not constitute an atomic operation like they should be.

Fix
---
* As part of e175d54, the global variables `deferred_signal_handling_needed`
  and `stapi_signal_handler_deferred` were updated to use `INTERLOCK_ADD` to prevent out-of-design values
  with multiple threads updating the same global variable at the same time.

* As part of a later commit 98e8638, `deferred_signal_handling_needed` was
  updated to use `COMPSWAP_LOCK` instead of `INTERLOCK_ADD`.

* But `stapi_signal_handler_deferred` continued to use `INTERLOCK_ADD`.

* The fix is simple and is to make `stapi_signal_handler_deferred` use `COMPSWAP_LOCK` instead of `INTERLOCK_ADD`.

* Towards this, the global variable now has a type of `global_latch_t`.

* A new `BIT_SET_INTERLOCKED` macro has been introduced in `sr_unix/sig_init.h`. This is based largely on
  the pre-existing `SET_DEFERRED_CONDITION` macro except that the new macro allows for the latch variable
  to be passed as a parameter.

  The following macros have been changed to use the new `BIT_SET_INTERLOCKED` macro.
  - STAPI_SET_SIGNAL_HANDLER_DEFERRED macro in sr_unix/sig_init.h
  - STAPI_CLEAR_SIGNAL_HANDLER_DEFERRED macro in sr_unix/sig_init.h
  - SET_DEFERRED_CONDITION macro in sr_port/have_crit.h
  - CLEAR_DEFERRED_CONDITION macro in sr_port/have_crit.h

* A new `BIT_GET_INTERLOCKED` macro gets the latch value. This is based on the pre-existing
  `GET_DEFERRED_CONDITION` macro in sr_port/have_crit.h.

  All places that need to check the value of the `stapi_signal_handler_deferred` global variable now use this
  new macro. the pre-existing `GET_DEFERRED_CONDITION` macro has also been changed to use this new macro.

* Unnecessary `GBLREF`s of `stapi_signal_handler_deferred` in a few files have been removed.

* Note that these changes are done based on a theory of what could have caused the failure. It is not yet
  known for sure if that is indeed the cause as the failure happens very rarely (it is not reproducible
  on demand even after running dozens of tests on lots of systems). Only time will tell if this fix is correct.

Misc changes
------------
* Noticed a longstanding typo in `sr_unix/ydb_stm_invoke_deferred_signal_handler.c` and fixed it.
  Not yet sure though if there is a user visible issue due to this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant