Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Certain Known Inefficiencies in Call-Ins #32 (callinperf) #41

Merged
merged 1 commit into from
Oct 12, 2017

Conversation

estess
Copy link
Contributor

@estess estess commented Sep 28, 2017

Files changes in this commit:

- sr_i386/g_msf.si
  1. Remove SFF_CI flag and resequence flags

- sr_port/alias_funcs.c
  1. Change SFF_CI flag usage to SFT_CI type usage & restructure checks due to GTM$CI
     frame no longer being there.

- sr_port/dollar_zlevel.c
  1. Spruce up some comments to make more sense.

- sr_port/f_text.c
  1. Remove GTM$CI from the list of supressed values (no longer exists).

- sr_port/fgncal.h
  1. Remove declaration for fgncal_lookup.c (no longer used).

- sr_port/fgncal_lookup.c
  1. Deleted - no longer used

- sr_port/gbldefs.c
  1. Remove param_list global (no longer used).

- sr_port/mdb_condition_handler.c
  1. Cleanups - remove #ifdef for UNIX (leaving code in place since all is UNIX now).
  2. Cleanups - remove VMS conditional code.
  3. Change SFF_CI flag usage to SFT_CI type usage.
  4. Fix loop looking for $ETRAP defined when $ZTRAP set to explicit NULL to stop at frame 0
     instead of frame 1. This allows it to find call-in base frame (now frame 0 instead of 1)
     and also the mumps -run base frame.

- sr_port/mdef.h
  1. Move DBGALS macro here from alias.h so can define DEBUG_ALIAS right here instead of
    requiring it to be a compiler option.

- sr_port/mprof_funcs.c
  1. Cleanups - remove #ifdef for UNIX (leaving code in place since all is UNIX now).
  2. Cleanups - remove VMS conditional code.
  3. When running the stack, skip over both TRIGGER and CALLIN base frames.

- sr_port/op_bindparm.c
  1. Change SFF_CI flag usage to SFT_CI type usage & restructure checks due to GTM$CI
     frame no longer being there.

- sr_port/op_halt.c
  2. If call-in mode, do NOT just exit - make sure to return to the call-in caller.

- sr_port/op_unwind.c
  2. Add include of gtmio.h for debugging macro(s).

- sr_port/op_zg1.c
  2. If call-in mode, restart the frame we unwind to. This returns to the caller.

- sr_port/op_zgoto.c
  1. Cleanups - remove #ifdef for UNIX (leaving code in place since all is UNIX now).
  2. Cleanups - remove VMS conditional code.
  3. Change SFF_CI flag usage to SFT_CI type usage.

- sr_port/op_zhalt.c
  1. If call-in mode, do NOT just exit - make sure to return to the call-in caller.

- sr_port/parm_pool.c
  1. Move the push_parm() routine out to sr_port/push_parm_src.h so it can be include in
     two forms:
     a. Form that gets its arguments from the varargs parameter list.
     b. Form that gets its arguments from a parmblk_struct.

- sr_port/stack_frame.h
  1. Remove SFF_CI flag and resequence flags to fill the gap.
  2. Add SFT_CI type id.
  3. Modify SKIP_BASE_FRAME() macro to skip both TRIGGER and CALL-IN base frames.

- sr_port/tp_unwind.c
  1.  Remove #ifdef DEBUG wrapper on debugging includes so they work with a pro build too.

- sr_port/unw_mv_ent.c
  1. Change debugging output to be more clear which path is being taken.

- sr_port/zlput_rname.c
  1. Since SKIP_BASE_FRAME macro now handles call-in base frames and because there's one fewer
     frames than there used to be when GTM$CI was used, reorganize the loop(s) traversing the
     M stack so it is more effectively utilized.

- sr_unix/ci_ret_code.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Remove the ci_ret_code() and ci_ret_code_exit() routines as not needed. House still
     maintains her brook.
  3. The ci_ret_code_quit() routine (called to unwind the CALL-IN base frame).

- sr_unix/error_return.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Don't let CALL-IN mode do an EXIT here - force return to caller.

- sr_unix/fgncalsp.h
  1. Get rid of no longer needed pointer fields (now passed directly)

- sr_unix/gtm_startup.c
  1. Add comments on odd happenings.

- sr_unix/gtm_trigger.c
  1. Change an M stack running loop to work without GTM$CI frame.

- sr_unix/gtm_unlink_all.c
  1. Remove check for GTM$CI executable.

- sr_unix/gtmci.c
  1. Removed use of longjmp() to effect return from call-ins.
  2. Change SFF_CI flag usage to SFT_CI type usage.
  3. Changed parameter handling to eliminate need for GTM$CI to reformat parameters
     appropriately for call-in to op_bindparm in generated M code.
  4. Eliminated need to drive op_extcall/op_extexfun to start M routine. We now call
     push_parm to set up called-routine parameters ourselves.

- sr_unix/gtmci.h
  1. Removed GTM$CI as an intermediate call-in frame.
  2. Modified SET_CI_ENV() macro to setup actual base frame instead of intermediate frame.

- sr_unix/gtmci_isv.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Some comment cleanup.

- sr_unix/gtmci_signals.c
  1. Remove reference to invocation flag MUMPS_GTMCI which was never being set anywhere.

- sr_unix/invocation_mode.h
  1. Remove MUMPS_GTMCI_OFF macro as unused.
  2. Remove MUMPS_GTMCI mode and resequence modes to fill gap (MUMPS_GTMCI never set).

- sr_unix/jobchild_init.c
  1. Don't create GTM$CI anymore (make_cimode() not needed and removed).
  2. Create single base CI frame instead of base + GTM$CI frame.

- sr_unix/make_cimode.c
  1. Removed - no longer needed.

- sr_unix/make_mode.c
  1. Removed all capability to create GTM$CI as this "program" is no longer needed.
  2. Removed all #ifdef of ia64.

- sr_unix/ojchildparms.c
  1. Change the "dummy" base frame generated from GTM$CI frame to GTM$DMOD frame. It doesn't
     matter what type of frame this is so long as there is one that can be used to return.

- sr_unix/op_fnfgncal.c
  1. Change SFF_CI flag usage to SFT_CI type usage.

- sr_unix/relinkctl.c
  1. Change M stack frame loop to compensate for the lack of the GTM$CI frame and which frame
     now has the CI marker (was flag field in GTM$CI frame, now type field in CI base frame).

- sr_unix/rtnhdr.h
  1. Modify CLEANUP_COPIED_RECURSIVE_RTN() macro such that a null routine header address in a
     base frame doesn't screw things up and cause that NULL to be dereferenced.

- sr_x86_64/ci_restart.s
  1. Remove this parameter massaging routine as being unnecessary anymore.

- sr_x86_64/g_msf.si
  1. Remove SFF_CI (no longer used) and SFF_ETRAP_ERR (still defined but not used in assemble).

-----------------------------------

This project significantly changes how call-ins works. To properly describe it, we need to first describe the
current (original) operation of callins. The following write-up between the ------ lines below describes the
current call-in operation:

-----------------------------------

How call-ins work:

gtm_init() - Initializes YDB runtime
  1.   - image_type set to GTM_IMAGE
  2.   - invocation mode set to MUMPS_CALLIN
  3.   - init_gtm()
  4.     - gtm_startup()
  5.       - allocate M stack
  6.       - create (dummy) base frame with return addr of gtm_ret_code()
  7.       - jobchild_init()
  8.         - make_cimode() with returned addr of created routine set into "base_addr". This dynamically created routine
               is called GTM$CI.
  9.         - gtm_init_env(base_addr, transfer_addr)
 10.           - base_frame(base_addr transfer_addr) - creates "another" base frame with an initial execution addr
                 of gtm_ret_code() but with rvector set to addr of GTM$CI.
 11.           - new_stack_frame(base_addr, PTEXT_ADR(base_addr), transfer_addr) - creates an executable frame for the
                 GTM$CI routine.
 12.         - Runs SET_CI_ENV() which changes the executable frame created so it has the SF_CI call-in flag set, and
               and changes the base frame's execution address to ci_ret_code_exit(). This base frame is only used when
               there's a ZGOTO 0.

Stack after gtm_init():
  0. BaseFrame (created by gtm_startup()) - probably just to have *something* on the stack in case later initialization fails.
     - This baseframe has a zeroed rvector with an execution address of gtm_ret_code(). There is no previous pointer nor is there
       an unwind frame pointer hiding behind the frame like most base frames have
     - This is the true bottom of the stack in terms of stack frames though an mv_stent is on the stack before this.
     - In practise, we won't see this frame as it isn't used past initialization.
  1. BaseFrame (created by base_frame() as called by gtm_init_env()).
     - rvector is for GTM$CI routine.
     - Execution address is ci_ret_code_exit() (again, only used for ZGOTO 0 aka process exit call from M).
     - The address field on the stack immediately prior to this base frame pointer is an actual unwind address to base frame 0.
  2. Executable frame (created by new_stack_frame() as called by gtm_init_env()).
     - rvector pointer is set to GTM$CI routine
     - Frame is flagged as as call-in

gtm_ci[p]() - Perform call-in:
  1. Executable frame set up in step #11 of gtm_init() above has its execution field modified to point to the GTM$CI routine.
  2. Creates parameter block including address of call-in routine and address of either op_extcall() or op_extexfun() which
     will add a new stack frame and have the call-in parameters set up.
  3. Drives dm_start() to invoke the callin. The actual call path goes like this:
       a. dm_start() - establishes mdb_condition_handler() and sets up to drive M code. Invokes top stack frame (GTM$CI).
       b. ci_restart() - this is the first thing done in GTM$CI. This takes the parmblk_struct passed in the global var
          param_list and restructures the inputs for a call to op_extcall() or op_extexfun() and jumps to it.
       c. op_extcall() or op_extexfun() allocate a new stack frame and "return" to it such that the actual routine we are
          calling into is invoked.
       d. When the M code returns, it returns to GTM$CI and runs ci_ret_code() which drives a longjmp back to where
          mdb_condition_handler was setup in dm_start().
       e. dm_start() then returns to gtm_ci[p] for return value processing.

The parameter flow is:
  1. Parms are passed in as part of the call to gtm_ci[p]().
  2. gtm_ci[p]() reformats the parameters into a "parm_blk".
  3. When ci_restart is driven, parms are converted from parm_blk to register/stack parms.
  4. When op_extexfun() is driven, parms are shifted around for call to push_parm().
  5. When push_parm() is driven from op_extexfun(), register/stack parms are converted to parm_blk parms.
  6. When called-in routine is driven, its parms are picked up from parm_blk by op_bindparm() and bound to local
     vars in the called routine.

-----------------------------------

Following (between the *********** lines) is a description of how call-ins work with this new support in place:

***********************************

How the new call-ins functionality works:

gtm_init() - Initializes YDB runtime
  1.   - image_type set to GTM_IMAGE
  2.   - invocation mode set to MUMPS_CALLIN
  3.   - init_gtm()
  4.     - gtm_startup()
  5.       - allocate M stack
  6.       - create (dummy) base frame with return addr of gtm_ret_code()
  7.       - jobchild_init()
  8.         - base_frame(base_addr transfer_addr) - creates "another" base frame with an initial execution addr
               of gtm_ret_code() but with rvector set to addr of GTM$CI.
  9.         - Runs SET_CI_ENV() which changes the executable frame created so it has the SF_CI call-in flag set, and
               and changes the base frame's execution address to ci_ret_code_exit(). This base frame is only used when
               there's a ZGOTO 0.

Stack after gtm_init():
  0. BaseFrame (created by gtm_startup()) - probably just to have *something* on the stack in case later initialization fails.
     - This baseframe has a zeroed rvector with an execution address of gtm_ret_code(). There is no previous pointer nor is there
       an unwind frame pointer hiding behind the frame like most base frames have
     - This is the true bottom of the stack in terms of stack frames though an mv_stent is on the stack before this.
     - In practise, we won't see this frame as it isn't used past initialization.
  1. BaseFrame (created by base_frame() as called by jobchild_init()).
     - rvector is NULL as this frame is just for return.
     - Execution address is gtm_levl_ret_code() - an entry point in dm_start() used to return without unwinding the frame.
     - The address field on the stack immediately prior to this base frame pointer is an actual unwind address to base frame 0.

gtm_ci[p]() - Perform call-in:
  1. Create new executable frame (new_stack_frame()) for the called M routine to run in.
  2. Cal push_parm_ci() which takes the parameter block built by gtm_ci[p]() and converts it to the parameter block used by
     op_bindparm() at the top of the called routine. See parameter flow below.
  3. Drives dm_start() to invoke the callin. Since the M stack frame for the called routine is on top, it gets immediately driven.
  4. When the M code returns, it returns to gtm_levl_ret_code which does a simple return to dm_start() without using longjmp().
  5. dm_start() then returns to gtm_ci[p] for return value processing.

The parameter flow is:
  1. Parms are passed in as part of the call to gtm_ci[p]().
  2. gtm_ci[p]() reformats the parameters into a "parm_blk".
  5. gtm_ci[p]() drive push_parm_ci() to setup the parms for processing by op_bindparm().
  6. When called-in routine is driven, its parms are picked up from parm_blk by op_bindparm() and bound to local
     vars in the called routine.

***********************************

One of the effects of this project is that $STACK and $ZLEVEL are different for a call-in routine than they
were previously. They now mirror the levels one would get by using mumps -run. The first level executing routine
is $ZLEVEL=1 and $STACK=0 instead of the previous $ZLEVEL=2 and $STACK=1.

[9/2017 SEE]

Files changes in this (squashed) commit:

sr_port/op_fnview.c
  1. Add "ENVIRONMENT" returns possible environment tokens "MUMPS", "MUPIP", "CALLIN", or "TRIGGER".
     Multiple comma separated tokens may be returned.

sr_port/viewtab.h
  1. Add "ENVIRONMENT" option.
  2. Remove non-UNIX environment #ifdef code.

Files changes in this (squashed) commit:

- sr_linux/release_name.h - Change to R1.10

Files changes in this (squashed) commit:

sr_port/alias.h:
  1. Move DBGALS related macros to mdef.h so we can put the define for DEBUG_ALIAS right there and not
     have to specify a compiler option.

sr_port/alias_funcs.c:
  1. The LVMON* view was overloaded. There was the LVMON command that monitors given/specified local variables
     for when they get changed and there were the LVMON* view commands that help to debug aliases and are active
     only when DEBUG_ALIAS is defined. When this define was enabled, the LVMON command was giving a VIEWAMBIG
     error since the view commands that followed it in the table were LVMON*. This update changes the name of the
     VIEW commands, the related routines and fields to be lvamon* instead of lvmon*. There are no tests of LVMON*
     in the test system due to it requiring a compiler flag to be incorporated so no tests were affected.

sr_port/gbldefs.c:
sr_port/lv_val.h:
sr_port/op_view.h
sr_port/tp_unwind.c
sr_port/viewtab.h
  1. Rename lvmon* for lvamon*.

sr_port/mdef.h
  1. Comment out define for DEBUG_ALIAS

@estess estess self-assigned this Sep 28, 2017
@estess estess requested a review from nars1 September 28, 2017 20:06
@shabiel
Copy link
Contributor

shabiel commented Sep 28, 2017

Thank you for this explanation of how call-ins work. I tried to read that code before...

@nars1
Copy link
Collaborator

nars1 commented Oct 2, 2017

Steve, note: I added ``` lines at beginning and end of the commit message. This made the commit message render better (as otherwise it was treated as a markdown formatted text file). The ``` defines a block of code that is escaped from markdown treatment.

@nars1
Copy link
Collaborator

nars1 commented Oct 2, 2017

Steve, since I am not able to comment on the commit message as part of the review, I am adding them here. The commit message indicates how the call-in code operated before and how it operates now. In addition, if it highlighted what the changes are (by pointing to the steps in either the previous or the new flow) that would help (I had to diff the two sections to see the change).

@nars1
Copy link
Collaborator

nars1 commented Oct 2, 2017

Need a Release Note section in the commit comment. Things that I see as needing mention are at least

  1. $zlevel difference
  2. zshow "s" difference "+1^GTM$CI" is now "(Call-In Level Entry)"
  3. zshow "s" now displays the M-stack past nested call-in boundaries whereas previously it stopped when the deepest GTM$CI level was reached.

There maybe more.

Copy link
Collaborator

@nars1 nars1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed commit message. I am requesting some more detail in some of my comments.

@@ -290,10 +293,9 @@ void als_lsymtab_repair(hash_table_mname *table, ht_ent_mname *table_base_orig,
}
}
fpprev = fp;
fp = fp->old_frame_pointer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this line removed? Previously it was possible for us to break out of the for loop (if done is TRUE) after setting fp to fp->old_frame_pointer whereas now fp would hold a different value than before if that same break happens. It does not seem right. A comment is definitely needed in the commit message if there is a reason for this.

Copy link
Contributor Author

@estess estess Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all has to do with the elimination of the GTM_CI frame. The frame chain used to look like this:
CI-base-frame <- GTM$CI <- M routine

Now it looks like:
CI-base-frame <- M routine

In the previous scheme the GTM$CI frame had a "flag value" SFF_CI that indicated this was a call-in frame. In the new scheme the the base frame has a "type value" with SFT_CI set that indicates this is a call-in frame. The removed statement was because once we have noted the frame with the type set for a call-in frame, we have to go back one more frame to get to its base frame. We no longer have to do that with the new version as the frame with the flag IS the base frame.

I will a version of this note to the notes for alias_funcs.c.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the connection between SFF_CI/SFT_CI and the break that happens in line 297 in the new code. Let us say there is NO call-in frame at all (i.e. no call-ins done in this process). If the break happens because "done" is TRUE, we would come out of the loop with a value of "fp" that is one frame newer than older code. And it is not clear to me if that is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two cases - a call-in frame is present or a call-in frame is NOT present. First the call-in frame is
present case. Skip to "***" to see not present case.

Previous code:

1        do
2        {
3               ... <irrelevant to this discussion stuff>
4               fpprev = fp;
5               fp = fp->old_frame_pointer;
6               if (done)
7                       break;
8               if (SFF_CI & fpprev->flags)
9               {       /* Callins needs to be able to crawl past apparent end of stack to earlier stack segments.
10                       * We should be in the base frame now. See if an earlier frame exists.
11                       * Note we don't worry about trigger base frames here because triggers *always* have a
12                       * different symbol table - previous symbol tables and stack levels are not affected.
13                       */
14                      fp = *(stack_frame **)(fp + 1); /* Backups up to "prev pointer" created by base_frame() */
15                      if ((NULL == fp) || (fp >= (stack_frame *)stackbase) || (fp < (stack_frame *)stacktop))
16                              break;  /* Pointer not within the stack -- must be earliest occurence */
17              }
18      } while(fp);
19      /* Next, check the mv_stents for the stackframes we processed. Certain mv_stents also have hash
20       * table references in them that need repair.
21       */
22      for (mv_st_ent = mv_chain;
23           mv_st_ent < (mv_stent *)(fp ? fp : fpprev);        /* Last stack frame actually processed */
24           mv_st_ent = (mv_stent *)(mv_st_ent->mv_st_next + (char *)mv_st_ent))
25      {
26              ... <rest of loop>
27      }

Recall that the M stack looks like:
[1:M-base-frame (no prev frame)] : [2:callin-base-frame] : [3:GTM$CI-frame (has SFF_CI set)] : [4:callin-routine-frame]

Operations in loop:
  1. fp -> frame 4 (line 1)
  2. fpprev -> frame 4 (line 4)
  3. fp -> frame 3 (line 5)
  4. test on line 8 is FALSE
  5. line 18 loop back up
  6. fpprev = frame 3 (line 4)
  7. fp -> frame 2 (line 5)
  8. test on line 8 is TRUE
  9. fp -> frame 1 (line 14)
  10. line 18 loop back up
  11. fpprev -> frame 1 (line 4)
  12. fp -> NULL (line 5)
  13. test on line 8 is FALSE
  14. loop exit due to fp == NULL (line 18)

At loop exit, fp -> NULL and fpprev -> frame 1 (the M base frame).

The mv_stent loop, since fp is NULL, would use fpprev (the call-in base frame) in its end-point test.



Now - new code:

1        do
2        {
3               ... <irrelevant to this discussion stuff>
4               fpprev = fp;
5               if (done)
6                       break;
7               if (SFT_CI & fp->flags)
8               {       /* Callins needs to be able to crawl past apparent end of stack to earlier stack segments.
9                        * We should be in the base frame now. See if an earlier frame exists.
10                       * Note we don't worry about trigger base frames here because triggers *always* have a
11                       * different symbol table - previous symbol tables and stack levels are not affected.
12                       */
13                      fp = *(stack_frame **)(fp + 1); /* Backups up to "prev pointer" created by base_frame() */
14                      if ((NULL == fp) || (fp >= (stack_frame *)stackbase) || (fp < (stack_frame *)stacktop))
15                              break;  /* Pointer not within the stack -- must be earliest occurence */
16              } else
17                      fp = fp->old_frame_pointer;
18      } while(fp);
19      /* Next, check the mv_stents for the stackframes we processed. Certain mv_stents also have hash
20       * table references in them that need repair.
21       */
22      for (mv_st_ent = mv_chain;
23           mv_st_ent < (mv_stent *)(fp ? fp : fpprev);        /* Last stack frame actually processed */
24           mv_st_ent = (mv_stent *)(mv_st_ent->mv_st_next + (char *)mv_st_ent))
25      {
26              ... <rest of loop>
27      }

Recall that the new stack looks like:
[1:M-base-frame (no prev frame)] : [2:callin-base-frame (has SFT_CI set)] : [3:callin-routine-frame]

Operations in loop:
  1. fp -> frame 1 (line 1)
  2. fpprev -> frame 1 (line 4)
  3. test on line 7 is FALSE
  4. fp -> frame 2 (line 17)
  5. line 18 loop back up
  6. fpprev -> frame 2 (line 4)
  7. test on line 7 is TRUE
  8. fp -> frame 1 (line 13)
  9. line 18 loop back up
  10. fpprev -> frame 1 (line 4)
  11. test on line 7 is FALSE
  12. fp -> NULL (line 17)
  13. loop exit due to fp == NULL (line 18)

At loop exit, fp -> NULL and fpprev -> frame 1 (the M base frame).

The mv_stent loop, since fp is NULL would use fpprev (the call-in base frame) in its end-point test.

******************************************************************************************************

This is if a call-in frame is NOT present in the stack.

Using the previous code above and the following M stack:

[1:M-base-frame (no prev frame)] : [2:M-frameA] : [3:M-frameB]

Operations in loop:
  1. fp -> frame 3 (line 1)
  2. fpprev -> frame 3 (line 4)
  3. fp -> frame 2 (line 5)
  4. test on line 8 is FALSE
  5. line 18 loop back up
  6. fpprev = frame 2 (line 4)
  7. fp -> frame 1 (line 5)
  8. test on line 8 is FALSE
  9. fpprev -> frame 1 (line 4)
  10. fp -> NULL (line 5)
  11. test on line 8 is FALSE
  12. loop exit due to fp == NULL (line 18)

At loop exit, fp -> NULL and fpprev -> frame 1 (the M base frame).

The mv_stent loop, since fp is NULL would use fpprev (the M base frame) in its end-point test.


Using the newer code above and the same M stack:

Operations in loop:
  1. fp -> frame 3 (line 1)
  2. fpprev -> frame 3 (line 4)
  3. test on line 7 is FALSE
  4. fp -> frame 2 (line 17)
  5. line 18 loop back up
  6. fpprev -> frame 2 (line 4)
  7. test on line 7 is FALSE
  8. fp -> frame 1 (line 17)
  9. line 18 loop back up
  10. fpprev -> frame 1 (line 4)
  11. test on line 7 is FALSE
  12. fp -> NULL
  13. lop exit due to fp == NULL (line 18)

At loop exit, fp -> NULL and fpprev -> frame 1 (the M base frame).

The mv_stent loop, since fp is NULL would use fpprev (the M base frame) in its end-point test.

******************************************************************************************************

I'm not seeing the problem you are theorizing. Can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After our phone conversation, I went back and see what you mean about when "done" is set, we could end on a different stack frame than previously. Figuring out whether that's bad or not will take some significant time so instead I have put the loop back the way it was and made adjustments for the lack of GTM$CI frame a different way that won't impact the values at loop exit. Were that actually the source of a bug (which seems likely but no idea how to trip it at this time), it would have been hard to find..

@@ -28,15 +33,15 @@ int dollar_zlevel()
if (!(fp->type & SFT_COUNT))
continue;
if (NULL == fpprev)
{ /* Next frame is some sort of base frame */
{ /* This frame is some sort of base frame */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This frame" -> "fp" would make it more clear. This or Next would be ambiguous in terms of whether they are speaking about fpprev or fp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the comment further to clarify:
/* If previous frame pointer is null, the current frame is some sort of base frame */

@@ -350,7 +332,6 @@ CONDITION_HANDLER(mdb_condition_handler)
lcl_error_frame = mvst->mv_st_cont.mvs_zintr.error_frame_save;
}
if (NULL == lcl_error_frame)
# endif
{
VMS_ONLY(assert(FALSE == tp_restart_fail_sig_used));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this line and any other VMS_ONLY usages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled all the VMSisms out of mdb_condition_handler.c but I won't promise I didn't break it. :-D

@@ -955,7 +900,7 @@ CONDITION_HANDLER(mdb_condition_handler)
GTMTRIG_ONLY(assert(0 == gtm_trigger_depth)); /* Should never happen in a trigger */
DBGEHND((stderr, "mdb_condition_handler: ztrap_explicit_null set - unwinding till find handler\n"));
assert(0 == dollar_etrap.str.len);
for (level = dollar_zlevel() - 1; level > 0; level--)
for (level = dollar_zlevel() - 1; 0 <= level; level--)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit comment has this to say about the 0 <= level change.

 4. Fix loop looking for $ETRAP defined when $ZTRAP set to explicit NULL to stop at frame 0
     instead of frame 1. This allows it to find call-in base frame (now frame 0 instead of 1)
     and also the mumps -run base frame.

While I agree the change is correct for call-in frames, I am still not sure if the change is innocuous for other invocations. That is because we will now be doing one more GOLEVEL than done before, i.e. a GOLEVEL(0) call, and it is not clear to me that it is safe to do in the various scenarios that can reach this codepath. Do you have an explanation that helps towards a better understanding for those scenarios?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A $STACK() level of 0 is the level of the first counted frame on the M stack in all instances:

  1. Call-ins (with the caveat that call-ins resets the observed level so the actual first frame is always behind it somewhere on the stack but that's irrelevant for this conversation).
  2. The mumps -run case.
  3. The direct mode case (where $STACK frame 0 is GTM$DMOD).

So long as we unwind UP TO frame 0 and don't try to actually unwind frame 0, we're fine.

I've written a small routine t.m that runs identically in V63002 and a freshly built V9994:
write "Starting off in main routine - $ZLEVEL=",$zlevel," and $STACK=",$stack,!
set $etrap="write ""made it to the trap"",! zshow ""*"""
do c
write "leaving main routine",!
quit
b
new $ztrap
set $ztrap=""
write "made it to b",!
set x=1/0
write "Didn't create an error dagnabbit!",!
quit

Note under circumstances other than setting $ztrap to null (explicit null situation), when setting $ztrap when $etrap is active, we would auto-NEW $ztrap and restore $etrap when that level pops. For reasons unclear to me, when we set $ZTRAP to null explicitly, this auto-NEW operation is bypassed so an explicit NEW is required to restore $etrap when routine "b" pops off the stack.

Unless there's a specific reason (that I certainly do not know) for this I would suggest we treat that as a bug.

@@ -876,14 +817,12 @@ STATICFNDEF void get_entryref_information(boolean_t line, trace_entry *tmp_trc_t
line_reset = FALSE;
for (fp = frame_pointer; fp; fp = fp->old_frame_pointer)
{
# ifdef GTM_TRIGGER
if (fp->type & SFT_TRIGR)
if (fp->type & (GTMTRIG_ONLY(SFT_TRIGR) | SFT_CI))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were not picking up the continuation frame_pointer for SFT_CI. Was that a bug? If so, does that need a release note and test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is a bug from reading the code - have not yet tried it. The result would be an inability to get entryref information on a routine that is on the stack but on the far side of a call-in base frame. Looking in the test system, I don't see any uses of call-ins in the mprof suite so this is likely a bug waiting to be found. I'll add a test for it to see if I can actually make something pop - even if only in debug mode.

sr_unix/gtmci.c Outdated
@@ -755,24 +777,31 @@ int gtm_ci_exec(const char *c_rtn_name, void *callin_handle, int populate_handle
assertpro(FALSE); /* should have been caught by citab_parse */
}
}
param_blk.args[i] = push_lvval(&arg_mval);
param_blk.args[i] = retval = push_lvval(&arg_mval);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the assignment to retval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't. retval was removed due to comment above. Artifact of a previous algorithm.

sr_unix/gtmci.c Outdated
mstr label, routine;
int has_return, i, len;
rhdtyp *base_addr;
uint4 inp_mask, out_mask, mask;
char *xfer_addr;
uint4 inp_mask, out_mask, mask, omask;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the omask variable in both gtm_cij & gtm_ci_exec functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I don't understand your question. The omask variable is used in both functions in the migration of the parms out of varargs and into the parm_blk structure. gtm_cij() is the Java interface while gtm_ci_exec() deals with the gtm_ci() and gtm_cip() calls. While there is functional overlap (at a high level) between these routines, they are actually quite different. Can you otherwise clarify or restate your question?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is a grep of omask in gtmci.c. I see it is being set and updated. But I don't see it ever USED. What am I missing?

grep omask sr_unix/gtmci.c
uint4 inp_mask, out_mask, mask, omask;
for (i = 0, mask = inp_mask, omask = out_mask;
++i, mask >>= 1, omask >>= 1, java_arg_type++, arg_blob_ptr += GTM64_ONLY(1) NON_GTM64_ONLY(2))
uint4 inp_mask, out_mask, mask, omask;
for (i = 0, mask = ~inp_mask, omask = out_mask; i < entry->argcnt; ++i, mask >>= 1, omask >>= 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see what you mean. Looked at it too fast before. Artifacts of an earlier revision. Now removed.

@@ -39,13 +41,9 @@
*/
#define FGNCAL_STACK ((NULL == TREF(temp_fgncal_stack)) ? fgncal_stack : TREF(temp_fgncal_stack))

void ci_restart(void);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discussion with you, I better understand the reason for removal of GTM$CI. And think it is best captured in the commit message. GTM$CI used to invoke make_cimode() which in turn invoked ci_restart() that did some parameter massaging. All that is now done as part of gtmci() itself and so no need of the extra GTM$CI frame. Some such description would help. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is/was discussed under the description of gtm_cip. If you think it better to pull it out and have a major point on GTM$CI and discuss what it did and why it was removed (tiz all there but a bit scattered), I can do that if you think it helps. Let me know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely would help have it in the commit comment. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

GEN_CALL(dmode->func_ptr1); /* line 0,1 */
#endif /* __ia64 */

#ifdef _AIX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. in the commit message below needs to be edited : ia64 -> ia64,AIX,etc.
    • sr_unix/make_mode.c
      1. Removed all capability to create GTM$CI as this "program" is no longer needed.
      2. Removed all #ifdef of ia64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added hpux and AIX to the ifdef removals.

sr_unix/rtnhdr.h Outdated
@@ -237,7 +240,7 @@ typedef struct
* references to this routine header. If not, it is ripe for cleanup as it has been replaced so drive zr_unlink_rtn() \
* on it. \
*/ \
if ((RTNHDR)->rtn_relinked) \
if ((NULL != (RTNHDR)) && (RTNHDR)->rtn_relinked) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a null routine header address in a base frame possible? I thought we always fill it with some non-NULL base address even if not completely setup yet? Is this a test that failed? Just for my understanding.
- sr_unix/rtnhdr.h
1. Modify CLEANUP_COPIED_RECURSIVE_RTN() macro such that a null routine header address in a
base frame doesn't screw things up and cause that NULL to be dereferenced.

Copy link
Contributor Author

@estess estess Oct 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the call-in base frame had the program address of GTM$CI set. Now there's nothing to put there. I thought about putting GTM$DMOD there but (a) I might have to build it when it's not otherwise necessary, (b) that could be confusing since it really doesn't belong there and (c) there were just a couple places that needed this check to keep from going bang if the routine hdr addr field was NULL.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we now have a NULL rtnhdr for a call-in base frame? If so, adding a comment that (NULL != RTNHDR) is needed for a call-in base frame would help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added.

{
fp = SKIP_BASE_FRAME(fp);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another place where one of those bugs likely exists (not tested yet but it does need one as I suspect it is an exploder.

This routine is called when we are replacing a routine and need to know if it is active on the stack somewhere. Previously our search stopped at a callin base frame so if a routine was in use prior to the callin (and on the stack prior to the callin) this routine won't find and signal it is ok to replace the routine when it is not. Flagged as needing a test

Copy link
Contributor Author

@estess estess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit the wrong button at some point so it changed this to a review instead.

I still owe an answer for gtm_trigger.c. Work on that one later.

@@ -39,13 +41,9 @@
*/
#define FGNCAL_STACK ((NULL == TREF(temp_fgncal_stack)) ? fgncal_stack : TREF(temp_fgncal_stack))

void ci_restart(void);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is/was discussed under the description of gtm_cip. If you think it better to pull it out and have a major point on GTM$CI and discuss what it did and why it was removed (tiz all there but a bit scattered), I can do that if you think it helps. Let me know.

GEN_CALL(dmode->func_ptr1); /* line 0,1 */
#endif /* __ia64 */

#ifdef _AIX
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added hpux and AIX to the ifdef removals.

sr_unix/rtnhdr.h Outdated
@@ -237,7 +240,7 @@ typedef struct
* references to this routine header. If not, it is ripe for cleanup as it has been replaced so drive zr_unlink_rtn() \
* on it. \
*/ \
if ((RTNHDR)->rtn_relinked) \
if ((NULL != (RTNHDR)) && (RTNHDR)->rtn_relinked) \
Copy link
Contributor Author

@estess estess Oct 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the call-in base frame had the program address of GTM$CI set. Now there's nothing to put there. I thought about putting GTM$DMOD there but (a) I might have to build it when it's not otherwise necessary, (b) that could be confusing since it really doesn't belong there and (c) there were just a couple places that needed this check to keep from going bang if the routine hdr addr field was NULL.

@nars1
Copy link
Collaborator

nars1 commented Oct 9, 2017

@estess : There were 3 comments I made on Oct 2nd (7 days ago). This was regarding the commit comment since commenting on that as part of the review was not possible. Those were not responded to.

@estess estess force-pushed the callinperf branch 2 times, most recently from 9663587 to 1893b12 Compare October 12, 2017 19:15
Copy link
Collaborator

@nars1 nars1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the enhanced comments in the commit message. I understand there are some todos noted in the release note but think there are no more code changes expected due to that, only doc notes. I am fine with the code changes.

```
Files changes in this commit:

- sr_i386/g_msf.si
  1. Remove SFF_CI flag and resequence flags

- sr_port/alias_funcs.c
  1. Change SFF_CI flag usage to SFT_CI type usage & restructure checks due to GTM$CI
     frame no longer being there.

  A review question asked why this change removed an "fp = fp->old-frame_pointer" statement. The answer is that it
  has to do with the elimination of the GTM$CI frame. The frame chain used to look like this:

    CI-base-frame <- GTM$CI <- M routine

  Now it looks like:

    CI-base-frame <- M routine

  In the previous scheme the GTM$CI frame had a "flag value" SFF_CI that indicated the M call was a call-in frame.
  In the new scheme the the base frame has a "type value" with SFT_CI set that indicates this is a call-in frame.
  The removed statement was because once we have noted the frame with the type set for a call-in frame, we had to
  back one more frame to get to its base frame. We no longer have to do that as the frame with the type flag IS
  the base frame.

- sr_port/dollar_zlevel.c
  1. Spruce up some comments to make more sense.

- sr_port/f_text.c
  1. Remove GTM$CI from the list of supressed values (no longer exists).

- sr_port/fgncal.h
  1. Remove declaration for fgncal_lookup.c (no longer used).

- sr_port/fgncal_lookup.c
  1. Deleted - no longer used

- sr_port/gbldefs.c
  1. Remove param_list global (no longer used).

- sr_port/mdb_condition_handler.c
  1. Cleanups - remove #ifdef for UNIX (leaving code in place since all is UNIX now).
  2. Cleanups - remove VMS conditional code.
  3. Change SFF_CI flag usage to SFT_CI type usage.
  4. Fix loop looking for $ETRAP defined when $ZTRAP set to explicit NULL to stop at frame 0
     instead of frame 1. This allows it to find call-in base frame (now frame 0 instead of 1)
     and also the mumps -run base frame.

- sr_port/mdef.h
  1. Move DBGALS macro here from alias.h so can define DEBUG_ALIAS right here instead of
    requiring it to be a compiler option.

- sr_port/mprof_funcs.c
  1. Cleanups - remove #ifdef for UNIX (leaving code in place since all is UNIX now).
  2. Cleanups - remove VMS conditional code.
  3. When running the stack, skip over both TRIGGER and CALLIN base frames.

- sr_port/op_bindparm.c
  1. Minor cosmetic changes and comment clarifications.

- sr_port/op_clralsvals.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Modify the loop to no longer assume a GTM$CI frame when call-ins are in use (no longer
     need to unwind one more frame to the base frame as the flagged frame IS the base frame).

- sr_port/op_halt.c
  2. If call-in mode, do NOT just exit - make sure to return to the call-in caller.

- sr_port/op_unwind.c
  2. Add include of gtmio.h for debugging macro(s).

- sr_port/op_zg1.c
  2. If call-in mode, restart the frame we unwind to. This returns to the caller.

- sr_port/op_zgoto.c
  1. Cleanups - remove #ifdef for UNIX (leaving code in place since all is UNIX now).
  2. Cleanups - remove VMS conditional code.
  3. Change SFF_CI flag usage to SFT_CI type usage.

- sr_port/op_zhalt.c
  1. If call-in mode, do NOT just exit - make sure to return to the call-in caller.

- sr_port/parm_pool.c
  1. Move the push_parm() routine out to sr_port/push_parm_src.h so it can be include in
     two forms:
     a. Form that gets its arguments from the varargs parameter list.
     b. Form that gets its arguments from a parmblk_struct.

- sr_port/stack_frame.h
  1. Remove SFF_CI flag and resequence flags to fill the gap.
  2. Add SFT_CI type id.
  3. Modify SKIP_BASE_FRAME() macro to skip both TRIGGER and CALL-IN base frames.

- sr_port/tp_unwind.c
  1.  Remove #ifdef DEBUG wrapper on debugging includes so they work with a pro build too.

- sr_port/unw_mv_ent.c
  1. Change debugging output to be more clear which path is being taken.

- sr_port/zlput_rname.c
  1. Since SKIP_BASE_FRAME macro now handles call-in base frames and because there's one fewer
     frames than there used to be when GTM$CI was used, reorganize the loop(s) traversing the
     M stack so it is more effectively utilized.

- sr_port/zshow_stack.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Add marker for when call-in base frame shows up in stack.

- sr_unix/ci_ret_code.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Remove the ci_ret_code() and ci_ret_code_exit() routines as not needed.  The ci_ret_code()
     routine did a longjmp() to "return" from the call-in - now replaced by a normal unwind
     procedure with no system calls. The ci_ret_code_exit() routine was set into the call-in
     base-frame and driven when ZGOTO 0 was done. It is removed because it too did a longjmp()
     which we are trying to avoid. That condition is now special cased in ZGOTO.
  3. The ci_ret_code_quit() routine (called to unwind the CALL-IN base frame) had comments
     added to better describe the dependencies

- sr_unix/error_return.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Don't let CALL-IN mode do an EXIT here - force return to caller.

- sr_unix/fgncalsp.h
  1. Get rid of no longer needed pointer fields (now passed directly)

- sr_unix/gtm_startup.c
  1. Add comments on odd happenings.

- sr_unix/gtm_trigger.c
  1. Change an M stack running loop to work without GTM$CI frame.

- sr_unix/gtm_unlink_all.c
  1. Remove check for GTM$CI executable.

- sr_unix/gtmci.c
  1. Removed use of longjmp() to effect return from call-ins.
  2. Change SFF_CI flag usage to SFT_CI type usage.
  3. Changed parameter handling to eliminate need for GTM$CI to reformat parameters
     appropriately for call-in to op_bindparm in generated M code.
  4. Eliminated need to drive op_extcall/op_extexfun to start M routine. We now call
     push_parm to set up called-routine parameters ourselves.

- sr_unix/gtmci.h
  1. Removed GTM$CI as an intermediate call-in frame.
  2. Modified SET_CI_ENV() macro to setup actual base frame instead of intermediate frame.

- sr_unix/gtmci_isv.c
  1. Change SFF_CI flag usage to SFT_CI type usage.
  2. Some comment cleanup.

- sr_unix/gtmci_signals.c
  1. Remove reference to invocation flag MUMPS_GTMCI which was never being set anywhere.

- sr_unix/invocation_mode.h
  1. Remove MUMPS_GTMCI_OFF macro as unused.
  2. Remove MUMPS_GTMCI mode and resequence modes to fill gap (MUMPS_GTMCI never set).

- sr_unix/jobchild_init.c
  1. Don't create GTM$CI anymore (make_cimode() not needed and removed).
  2. Create single base CI frame instead of base + GTM$CI frame.

- sr_unix/make_cimode.c
  1. Removed - no longer needed.

- sr_unix/make_mode.c
  1. Removed all capability to create GTM$CI as this "program" is no longer needed.
  2. Removed all #ifdef of __ia64, __hpux, and _AIX.

- sr_unix/ojchildparms.c
  1. Change the "dummy" base frame generated from GTM$CI frame to GTM$DMOD frame. It doesn't
     matter what type of frame this is so long as there is one that can be used to return.

- sr_unix/op_fnfgncal.c
  1. Change SFF_CI flag usage to SFT_CI type usage.

- sr_unix/relinkctl.c
  1. Change M stack frame loop to compensate for the lack of the GTM$CI frame and which frame
     now has the CI marker (was flag field in GTM$CI frame, now type field in CI base frame).

- sr_unix/rtnhdr.h
  1. Modify CLEANUP_COPIED_RECURSIVE_RTN() macro such that a null routine header address in a
     base frame doesn't screw things up and cause that NULL to be dereferenced.

- sr_x86_64/ci_restart.s
  1. Remove this parameter massaging routine as being unnecessary anymore.

- sr_x86_64/g_msf.si
  1. Remove SFF_CI (no longer used) and SFF_ETRAP_ERR (still defined but not used in assemble).

sr_port/op_fnview.c
  1. Add "ENVIRONMENT" returns possible environment tokens "MUMPS", "MUPIP", "CALLIN", or "TRIGGER".
     Multiple comma separated tokens may be returned.

sr_port/viewtab.h
  1. Add "ENVIRONMENT" option.
  2. Remove non-UNIX environment #ifdef code.

- sr_linux/release_name.h - Change to R1.10

sr_port/alias.h:
  1. Move DBGALS related macros to mdef.h so we can put the define for DEBUG_ALIAS right there and not
     have to specify a compiler option.

sr_port/alias_funcs.c:
  1. The LVMON* view was overloaded. There was the LVMON command that monitors given/specified local variables
     for when they get changed and there were the LVMON* view commands that help to debug aliases and are active
     only when DEBUG_ALIAS is defined. When this define was enabled, the LVMON command was giving a VIEWAMBIG
     error since the view commands that followed it in the table were LVMON*. This update changes the name of the
     VIEW commands, the related routines and fields to be lvamon* instead of lvmon*. There are no tests of LVMON*
     in the test system due to it requiring a compiler flag to be incorporated so no tests were affected.

sr_port/gbldefs.c:
sr_port/lv_val.h:
sr_port/op_view.h
sr_port/tp_unwind.c
sr_port/viewtab.h
  1. Rename lvmon* for lvamon*.

sr_port/mdef.h
  1. Comment out define for DEBUG_ALIAS

-----------------------------------

This project significantly changes how call-ins works. To properly describe it, we need to first describe the
current (original) operation of callins. The following write-up between the ------ lines below describes the
current (r100 and previous versions) call-in operation:

-----------------------------------

How call-ins work:

gtm_init() - Initializes YDB runtime
  1.   - image_type set to GTM_IMAGE
  2.   - invocation mode set to MUMPS_CALLIN
  3.   - init_gtm()
  4.     - gtm_startup()
  5.       - allocate M stack
  6.       - create (dummy) base frame with return addr of gtm_ret_code()
  7.       - jobchild_init()
  8.         - make_cimode() with returned addr of created routine set into "base_addr". This dynamically created routine
               is called GTM$CI.
  9.         - gtm_init_env(base_addr, transfer_addr)
 10.           - base_frame(base_addr transfer_addr) - creates "another" base frame with an initial execution addr
                 of gtm_ret_code() but with rvector set to addr of GTM$CI.
 11.           - new_stack_frame(base_addr, PTEXT_ADR(base_addr), transfer_addr) - creates an executable frame for the
                 GTM$CI routine.
 12.         - Runs SET_CI_ENV() which changes the executable frame created so it has the SF_CI call-in flag set, and
               and changes the base frame's execution address to ci_ret_code_exit(). This base frame is only used when
               there's a ZGOTO 0.

Stack after gtm_init():
  0. BaseFrame (created by gtm_startup()) - probably just to have *something* on the stack in case later initialization fails.
     - This baseframe has a zeroed rvector with an execution address of gtm_ret_code(). There is no previous pointer nor is there
       an unwind frame pointer hiding behind the frame like most base frames have
     - This is the true bottom of the stack in terms of stack frames though an mv_stent is on the stack before this.
     - In practise, we won't see this frame as it isn't used past initialization.
  1. BaseFrame (created by base_frame() as called by gtm_init_env()).
     - rvector is for GTM$CI routine.
     - Execution address is ci_ret_code_exit() (again, only used for ZGOTO 0 aka process exit call from M).
     - The address field on the stack immediately prior to this base frame pointer is an actual unwind address to base frame 0.
  2. Executable frame (created by new_stack_frame() as called by gtm_init_env()).
     - rvector pointer is set to GTM$CI routine
     - Frame is flagged as as call-in

gtm_ci[p]() - Perform call-in:
  1. Executable frame set up in step YottaDB#11 of gtm_init() above has its execution field modified to point to the GTM$CI routine.
  2. Creates parameter block including address of call-in routine and address of either op_extcall() or op_extexfun() which
     will add a new stack frame and have the call-in parameters set up.
  3. Drives dm_start() to invoke the callin. The actual call path goes like this:
       a. dm_start() - establishes mdb_condition_handler() and sets up to drive M code. Invokes top stack frame (GTM$CI).
       b. ci_restart() - this is the first thing done in GTM$CI. This takes the parmblk_struct passed in the global var
          param_list and restructures the inputs for a call to op_extcall() or op_extexfun() and jumps to it.
       c. op_extcall() or op_extexfun() allocate a new stack frame and "return" to it such that the actual routine we are
          calling into is invoked.
       d. When the M code returns, it returns to GTM$CI and runs ci_ret_code() which drives a longjmp back to where
          mdb_condition_handler was setup in dm_start().
       e. dm_start() then returns to gtm_ci[p] for return value processing.

The old parameter flow is:
  1. Parms are passed in as part of the call to gtm_ci[p]().
  2. gtm_ci[p]() reformats the parameters into a "parm_blk".
  3. When ci_restart is driven, parms are converted from parm_blk to register/stack parms.
  4. When op_extexfun() is driven, parms are shifted around for call to push_parm().
  5. When push_parm() is driven from op_extexfun(), register/stack parms are converted to parm_blk parms.
  6. When called-in routine is driven, its parms are picked up from parm_blk by op_bindparm() and bound to local
     vars in the called routine.

Note, the entire purpose of the GTM$CI routine is to create the stack frame the call-in will run in and to set up its
parameters in a fashion that op_bindparm() can read them. Since the op_bindparm of the day expected a varargs list (albeit
in a different fashion from the varargs list passed into gtm_ci[p]()), at that time, the GTM$CI was written to provide the
parameter conversion necessary at the time. The new support described next has made some changes that make this extra
conversion step unnecessary.

-----------------------------------

Following (between the *********** lines) is a description of how call-ins work with this new support in place:

***********************************

How the new call-ins functionality works:

gtm_init() - Initializes YDB runtime
  1.   - image_type set to GTM_IMAGE
  2.   - invocation mode set to MUMPS_CALLIN
  3.   - init_gtm()
  4.     - gtm_startup()
  5.       - allocate M stack
  6.       - create (dummy) base frame with return addr of gtm_ret_code()
  7.       - jobchild_init()
  8.         - base_frame(base_addr transfer_addr) - creates "another" base frame with an initial execution addr
               of gtm_ret_code() but with rvector set to addr of GTM$CI.
  9.         - Runs SET_CI_ENV() which changes the executable frame created so it has the SF_CI call-in flag set, and
               and changes the base frame's execution address to ci_ret_code_exit(). This base frame is only used when
               there's a ZGOTO 0.

Stack after gtm_init():
  0. BaseFrame (created by gtm_startup()) - probably just to have *something* on the stack in case later initialization fails.
     - This baseframe has a zeroed rvector with an execution address of gtm_ret_code(). There is no previous pointer nor is there
       an unwind frame pointer hiding behind the frame like most base frames have
     - This is the true bottom of the stack in terms of stack frames though an mv_stent is on the stack before this.
     - In practise, we won't see this frame as it isn't used past initialization.
  1. BaseFrame (created by base_frame() as called by jobchild_init()).
     - rvector is NULL as this frame is just for return.
     - Execution address is gtm_levl_ret_code() - an entry point in dm_start() used to return without unwinding the frame.
     - The address field on the stack immediately prior to this base frame pointer is an actual unwind address to base frame 0.

gtm_ci[p]() - Perform call-in:
  1. Create new executable frame (new_stack_frame()) for the called M routine to run in.
  2. Creates parameter block which contains a newly constructed lv_val for each incoming parameter.
  3. Cal push_parm_ci() which takes the parameter block built by gtm_ci[p]() and moves the parms to the the parameter area used by
     op_bindparm() which gets called at the top of the called target routine. See parameter flow below.
  4. Drives dm_start() to invoke the callin. Since the M stack frame for the called routine is on top, it gets immediately driven.
  5. When the M code returns, it returns to gtm_levl_ret_code which does a simple return to dm_start() without using longjmp().
  6. dm_start() then returns to gtm_ci[p] for return value processing.

The parameter flow is:
  1. Parms are passed in as part of the call to gtm_ci[p]().
  2. gtm_ci[p]() reformats the parameters into a "parm_blk".
  3. gtm_ci[p]() drives push_parm_ci() to setup the parms for processing by op_bindparm() (moves them into a specific area used
     for buffering parameters).
  4. When called-in routine is driven (op_bindparm() gets driven at all entryrefs with parameter lists defined - even if empty)
     op_bindparm pulls the parms from the parm pool parameter space created by push_parm_ci() and binds them to local vars in
     the called routine then releases the parameter space for reuse.

***********************************

-----------------------------------

Operational changes with this project in the execution of a call-in (note steps are from original code):

- No longer call make_cimode() to generate the internal GTM$CI routine (step gtm_init YottaDB#8).
- Call base_frame() instead of gtm_init_env() to create ONLY the call-in base frame instead of the base frame plus
  the GTM$CI frame (step gtm_init YottaDB#9).
- No longer setup GTM$CI as part of gtm_ci[p](). Instead, we allocate the frame using new_stack_frame() called
  directly (gtm_ci[p]() YottaDB#1).
- After the call-in execution frame is created by gtm_ci[p](), a variant of the push_parm routine is called that takes
  the parameters from the parm block that gtm_ci[p] created from the incoming arguments and converts them so they can
  be processed by op_bindparm() which is a routine called by generated code anytime a label with arguments occurs in
  the M source code (gtm_ci[p]() YottaDB#2 and YottaDB#3).
- The execution flow used to look like:
    * C [possibly main] routine
    * call gtm_ci[p]()
      * drive dm_start() to kick off transition to M mode execution
        * drive GTM$CI "build" routine.
          * op_extexfun()
	  * Drive target M routine
	  * Target routine returns to GTM$CI which then drives ci_ret_code (longjmp back to dm_start).
        * return to gtm_ci[p] via longjmp() in ci_ret_code() which returns to dm_start() and returns from there.
    * Return to C caller
- The flow has similar steps but one less layer and returns without the use of a (longjmp) system call:
    * C [possibly main] routine
    * call gtm_ci[p]()
      * drive dm_start() to kick off transition to M mode execution
        * drive target M routine
	* M returns to gtm_levl_ret_code which returns from an M routine without unwinding it which unwinds to dm_start.
        * return to gtm_ci[p]
      * return to C caller.

-----------------------------------

Notes on the removal of GTM$CI.

- A call-in previously did not return "normally" by a return statement that unwound the stack. Instead it invoked
  a system call (longjmp()) to unwind the stack and return to dm_start() which then returned "normally" to the caller.
- A much faster way to return would be to just - return but GTM$CI complicated that. The purpose of the GTM$CI "routine"
  was to set up arguments and call an assembler "glue code" routine to put the parms where they needed to be before
  calling the call-in routine.
- Specifically, GTM$CI was a "constructed" routine (much like GTM$DMOD) in that it was not built from M source but
  was created by the make_mode() routine. Here's the operation:
    a. gtm_ci[p]() builds a parm_blk that contains the name of the glue routine to call (op_extexfun or op_extcall
       depending on whether has args or return value). The parm_blk also contains the parameters and other things
       needed to effect a call.
    b. gtm_ci[p]() drives dm_start() which enters M mode and drives the top routine on the M stack which happens to
       be GTM$CI.
    c. GTM$CI's first function is to drive ci_restart() which is an assembler routine sort of like a specialized
       version of callg() that takes the routine to call (e.g. op_extexfun), the routine/label to call and all the
       input and output parameters for the call-in routine and drives it to create the stack.
    d. When op_extexfun() completes, it drives the top routine on the M stack which is now the call-in frame.
    e. When the callin frame returns to GTM$CI, it drives ci_ret_code() which does a longjmp() to return.
- So GTM$CI's primary purpose was to allocate the stack frame, setup the arguments and drive the glue code that
  made the actual call. This means the arguments were reformatted at least twice. Wanted to avoid that. Also, the
  system call flavor of return is unnecessary. Wanted to avoid that too.
- Two changes allowed us to be rid of the extra GTM$CI overhead:
    1. With the routine gtm_levl_ret_code added to GT.M for triggers, it became possible for a stackframe to return
       without being unwound. We make use of that so we can do the simple unwind of call-in levels.
    2. By changing the push_parm() routine (in parm pool) so we it can pull the arguments for the call-in routine
       directly out of the parm block created by gtm_ci[p] instead of having to push them on the stack and pull them
       back off in the glue routine. This avoided one of the argument restructurings that were happening.
- Because GTM$CI went away, the SFF_CI flag was changed to a type flag instead (since that is what it actually is) and
  it was moved to the base frame itself which is how triggers also does it.

-----------------------------------

User visible changes in this project (for specification in release note):

1. There is no GTM$CI level anymore (no longer needed). So this name no longer shows up in stack listings.
2. Because there is no GTM$CI level anymore, the $STACK and $ZLEVEL SVNs show one less than they used to in a call-in
   environment. These SVNs now mirror the levels one would get by using mumps -run. The first level executing routine
   is $ZLEVEL=1 and $STACK=0 instead of the previous $ZLEVEL=2 and $STACK=1.
3. ZSHOW "S" shows the entire stack. It used to stop at the first call-in frame and not report any further back. A
   stack marker shows where call-in base frames are located in the stack. Where a call-in frame is detected, the text
   "(Call-In Level Entry)" appears in the stack list.
4. When replacing a routine that is active on the stack, we ran the stack backwards to verify the routine was not being used.
   You can still replace an active routine but it is a special case. Unfortunately the loop was again stopping at the first
   call-in frame and not looking further back. If a routine being replaced was on the stack further back than that call-in
   base-frame, ugly stuff was likely to occur when we unwound back to that earlier frame.
5. Similar issue when M-Profiling is looking up an entry ref on the stack. Need to figure out what happens when it doesn't
   find it.
6. $VIEW(ENVIRONMENT) description.
7. [Z]HALT in a call-in do not halt but return to the caller as they should but did not previously.

```
@estess estess merged commit ee9f501 into YottaDB:master Oct 12, 2017
@estess estess deleted the callinperf branch October 12, 2017 20:43
chathaway-codes pushed a commit that referenced this pull request Nov 27, 2018
…ops while a V4 block is dirty and SimpleThreadAPI is active

The simplethreadapi/lockst subtest failed as follows when the test framework randomly chose V4 format blocks.

%YDB-F-MAXRTSERRDEPTH Error loop detected - aborting image with core

And produced a core with the following C-stack

 #0  __pthread_kill (threadid=<optimized out>, signo=3) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
 #1  gtm_dump_core () at sr_unix/gtm_dump_core.c:72
 #2  send_msg_va (csa=0x0, arg_count=9, var=0x7f25a0f7ff40) at sr_unix/send_msg.c:123
 #3  send_msg (arg_count=9) at sr_unix/send_msg.c:74
 #4  gtm_assert2 (condlen=17, condtext=0x7f25a62dc0d6 "!timer_in_handler", file_name_len=45, file_name=0x7f25a62dc0a8 "sr_unix/send_msg.c", line_no=125) at sr_port/gtm_assert2.c:34
 #5  send_msg_va (csa=0x0, arg_count=9, var=0x7f25a0f80570) at sr_unix/send_msg.c:125
 #6  send_msg (arg_count=9) at sr_unix/send_msg.c:74
 #7  gtm_assert2 (condlen=17, condtext=0x7f25a62dc0d6 "!timer_in_handler", file_name_len=45, file_name=0x7f25a62dc0a8 "sr_unix/send_msg.c", line_no=125) at sr_port/gtm_assert2.c:34
 .
 .
 #32 send_msg_va (csa=0x0, arg_count=9, var=0x7f25a0f83d20) at sr_unix/send_msg.c:125
 #33 send_msg (arg_count=9) at sr_unix/send_msg.c:74
 #34 gtm_assert2 (condlen=17, condtext=0x7f25a6305658 "!timer_in_handler", file_name_len=51, file_name=0x7f25a6305320 "sr_port/gtm_malloc_src.h", line_no=685) at sr_port/gtm_assert2.c:34
 #35 gtm_malloc (size=4096) at sr_port/gtm_malloc_src.h:685
 #36 wcs_wtstart (region=0x5558a88e1170, writes=0, cr_list_ptr=0x0, cr2flush=0x0) at sr_unix/wcs_wtstart.c:538
 #37 wcs_stale (tid=93839273365872, hd_len=8, region=0x5558a8979628) at sr_port/t_end_sysops.c:1387
 #38 timer_handler (why=14) at sr_unix/gt_timers.c:815
 #39 <signal handler called>
 #40 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5558a88e2ea8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
 #41 __pthread_cond_wait_common (abstime=0x0, mutex=0x5558a88e2e40, cond=0x5558a88e2e80) at pthread_cond_wait.c:502
 #42 __pthread_cond_wait (cond=0x5558a88e2e80, mutex=0x5558a88e2e40) at pthread_cond_wait.c:655
 #43 ydb_stm_thread (parm=0x0) at sr_unix/ydb_stm_thread.c:92
 #44 start_thread (arg=0x7f25a0f85700) at pthread_create.c:463
 #45 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The primary issue is failure in the assert "assert(!timer_in_handler)" inside the PTHREAD_MUTEX_LOCK_IF_NEEDED
macro at frame # 35 (gtm_malloc). This is because we are about to make a pthread_mutex_lock() call which should
not be done while inside a signal handler (according to the man pages). To fix this issue, we skip flushing
that particular cache-record in wcs_wtstart() thereby avoiding the need to call gtm_malloc() (and in turn
the pthread_mutex_lock() call) while inside the timer handler.
nars1 added a commit that referenced this pull request Aug 7, 2020
…that can fail in rare cases

* The below assert failed once in 1500 r126/ydb464 subtest runs across a dozen boxes.

  ```c
  assert((1 != forced_exit) || GET_DEFERRED_EXIT_CHECK_NEEDED);

  #27 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #28 wcs_wtstart (region=0x7f6970, writes=0, cr_list_ptr=0x0, cr2flush=0x0) at sr_unix/wcs_wtstart.c:830
  #29 wcs_stale (tid=8350064, hd_len=8, region=0x872fe8) at sr_port/t_end_sysops.c:1386
  #30 timer_handler (why=14, info=0x7fadda7caea8 <stapi_signal_handler_oscontext+10728>, context=0x7fadda7caf28 <stapi_signal_handler_oscontext+10856>) at sr_unix/gt_timers.c:773
  #31 ydb_os_signal_handler (sig=14, info=0x7ffdee359eb0, context=0x7ffdee359d80) at sr_unix/ydb_os_signal_handler.c:66
  #32 <signal handler called>
  #33 generic_signal_handler (sig=2, info=0x7fadda7c9588 <stapi_signal_handler_oscontext+4296>, context=0x7fadda7c9608 <stapi_signal_handler_oscontext+4424>) at sr_unix/generic_signal_handler.c:244
  #34 ydb_os_signal_handler (sig=2, info=0x7ffdee35a870, context=0x7ffdee35a740) at sr_unix/ydb_os_signal_handler.c:88
  #35 <signal handler called>
  #36 clock_nanosleep () from /usr/lib64/libc.so.6
  #37 m_usleep (useconds=1000) at sr_unix/sleep.c:28
  #38 wcs_sleep (sleepfactor=1) at sr_port/wcs_sleep.c:28
  #39 wcs_phase2_commit_wait (csa=0x7f4840, cr=0x7fadd7b88a50) at sr_port/wcs_phase2_commit_wait.c:268
  #40 t_qread (blk=6, cycle=0x7ffdee35c568, cr_out=0x7ffdee35c558) at sr_port/t_qread.c:668
  #41 gvcst_search (pKey=0x7f6040, pHist=0x0) at sr_port/gvcst_search.c:438
  #42 gvcst_put2 (val=0x7fadda7d3940 <increment_delta_mval>, parms=0x7ffdee35f8b0) at sr_port/gvcst_put.c:980
  #43 gvcst_put (val=0x7fadda7d3940 <increment_delta_mval>) at sr_port/gvcst_put.c:299
  #44 gvcst_incr (increment=0x7ffdee362030, result=0x7ffdee362050) at sr_port/gvcst_incr.c:60
  #45 op_gvincr (increment=0x7ffdee362030, result=0x7ffdee362050) at sr_port/op_gvincr.c:60
  #46 ydb_incr_s (varname=0x7ffdee362b10, subs_used=0, subsarray=0x7ffdee3629d0, increment=0x7ffdee362aa0, ret_value=0x7ffdee362a90) at sr_unix/ydb_incr_s.c:156
  #47 runProc (settings=0x7ffdee363490, curDepth=0) at /usr/library/gtm_test/T998/simpleapi/inref/randomWalk.c:447
  #48 runProc_driver (settings=0x7ffdee363490) at /usr/library/gtm_test/T998/simpleapi/inref/randomWalk.c:145
  #49 main () at /usr/library/gtm_test/T998/simpleapi/inref/randomWalk.c:93

  (gdb) f 28
  #28 wcs_wtstart (region=0x7f6970, writes=0, cr_list_ptr=0x0, cr2flush=0x0) at sr_unix/wcs_wtstart.c:830
  830             ENABLE_INTERRUPTS(INTRPT_IN_WCS_WTSTART, prev_intrpt_state);

  (gdb) p forced_exit
  $5 = 1

  (gdb) p deferred_signal_handling_needed
  $6 = 0

  ```

* This is because the signal handler for SIGALRM (sig=14) happened after line 183 but before line 193.

    ```c
    sr_port/have_crit.h
    -------------------
    183         forced_exit = 1;                                                                        \
    184         if (in_os_signal_handler)                                                               \
    185         {       /* If we are inside an OS signal handler and therefore had to defer exit        \
    186                  * handling, set "outofband" also to TRUE as this is checked by lots of         \
    187                  * potentially long-running commands in the runtime (e.g. HANG etc.) and we     \
    188                  * want all of those to automatically trigger process exit handling.            \
    189                  */                                                                             \
    190                 outofband = deferred_signal;                                                    \
    191         }                                                                                       \
    192         /* Whenever "forced_exit" gets set to 1, set the corresponding deferred event too */    \
    193         SET_DEFERRED_EXIT_CHECK_NEEDED;                                                         \
    ```

* We cannot easily protect against this situation and so it is better to remove the assert.
nars1 added a commit that referenced this pull request Jan 19, 2022
… failure in a Simple Threaded API application

Background
----------
* The ideminter_rolrec/interrupted_rollback_or_recover subtest failed on a RHEL8 x86_64 in-house system
  as well as an AARCH64 system with the following symptom.

  ```diff
  107a108,239
  > ideminter_rolrec_0/interrupted_rollback_or_recover/savedjoboutput_0_20220107_233728/impjob_imptp0.mje4
  > %YDB-F-MAXRTSERRDEPTH Error loop detected - aborting image with core
  ```

* The C-stack of the core file had the following (frames with repeating traces not shown below)

  ```c
  #0  pthread_kill () from /usr/lib64/libpthread.so.0
  #1  gtm_dump_core () at sr_unix/gtm_dump_core.c:74
  #2  rts_error_va (csa=0x0, argcnt=7, var=0x7ffc809d7070) at sr_unix/rts_error.c:144
  #3  rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #4  ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95
  .
  .
  #38 rts_error_va (csa=0x0, argcnt=7, var=0x7ffc809dadb0) at sr_unix/rts_error.c:163
  #39 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #40 ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95
  #41 deferred_signal_handler () at sr_port/deferred_signal_handler.c:64
  #42 eintr_handling_check () at sr_port/eintr_handling_check.c:29
  #43 gtm_fprintf (stream=0x7f7f6c14b600 <_IO_2_1_stderr_>, format=0x7f7f6d7e1005 "%s") at sr_unix/gtm_stdio.c:75
  #44 util_out_print_vaparm (message=0x0, flush=1, var=0x7ffc809dbb20, faocnt=2147483647) at sr_unix/util_output.c:873
  #45 util_out_print (message=0x0, flush=1) at sr_unix/util_output.c:911
  #46 util_cond_flush () at sr_unix/util_output.c:952
  #47 ch_cond_core () at sr_unix/ch_cond_core.c:76
  #48 rts_error_va (csa=0x0, argcnt=7, var=0x7ffc809dbd20) at sr_unix/rts_error.c:198
  #49 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #50 ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95
  #51 deferred_signal_handler () at sr_port/deferred_signal_handler.c:64
  #52 grab_lock (reg=0xe8e960, is_blocking_wait=1, onln_rlbk_action=2) at sr_unix/grab_lock.c:111
  #53 tp_tend () at sr_port/tp_tend.c:1261
  #54 op_tcommit () at sr_port/op_tcommit.c:494
  #55 stkok3 () at sr_x86_64/opp_tcommit.s:30

  (gdb) f 50
  #50 0x00007f7f6d51efb2 in ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95
  95                      assert((SIGINT == stapi_signal_handler_oscontext[sig_hndlr_ctrlc_handler].sig_num) || USING_ALTERNATE_SIGHANDLING);

  (gdb) list
  93       if (STAPI_IS_SIGNAL_HANDLER_DEFERRED(sig_hndlr_ctrlc_handler))
  94       {
  95   -->         assert((SIGINT == stapi_signal_handler_oscontext[sig_hndlr_ctrlc_handler].sig_num) || USING_ALTERNATE_SIGHANDLING);
  96               ydb_stm_invoke_deferred_signal_handler_type = sig_hndlr_continue_handler;
  97               ctrlc_handler(DUMMY_SIG_NUM, NULL, NULL);
  98               ydb_stm_invoke_deferred_signal_handler_type = sig_hndlr_none;
  99       }

  (gdb) p/x stapi_signal_handler_deferred
  $15 = 0x404
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_none].sig_num
  $3 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_continue_handler].sig_num
  $4 = 18
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_ctrlc_handler].sig_num
  $5 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_dbcertify_signal_handler].sig_num
  $6 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_generic_signal_handler].sig_num
  $7 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_jobexam_signal_handler].sig_num
  $8 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_jobinterrupt_event].sig_num
  $9 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_job_term_handler].sig_num
  $10 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_op_fnzpeek_signal_handler].sig_num
  $11 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_suspsigs_handler].sig_num
  $12 = 0
  (gdb) p stapi_signal_handler_oscontext[sig_hndlr_timer_handler].sig_num
  $13 = 14
  (gdb) p simpleThreadAPI_active
  $1 = 1
  ```

* As can be seen from the gdb analysis of frame 50, the global variable `stapi_signal_handler_deferred` has 2 bits set.
  The ones corresponding to `sig_hndlr_ctrlc_handler` and `sig_hndlr_timer_handler`.

  But the `stapi_signal_handler_oscontext[]` global variable indicates that the 2 signals that were deferred corresponded to
  `sig_hndlr_continue_handler` and `sig_hndlr_timer_handler`.

  This disconnect between `sig_hndlr_ctrlc_handler` and `sig_hndlr_continue_handler` across the 2 global variables
  contributed to the assert failure.

Issue
-----
* After a code review, I have a theory on the cause of the issue. And that is the below macro got invoked
  by multiple threads at the same time for 2 SIGCONT signals (multiple threads possible since this is
  a Simple Thread API application as seen by the non-zero value of `simpleThreadAPI_active` in gdb).

  **sr_unix/sig_init.h**
  ```c
     80 #define STAPI_SET_SIGNAL_HANDLER_DEFERRED(SIGHNDLRTYPE, SIG_NUM, INFO, CONTEXT)         \
     81 {                                                                                       \
     82         SAVE_OS_SIGNAL_HANDLER_SIGNUM(SIGHNDLRTYPE, SIG_NUM);                           \
     83         SAVE_OS_SIGNAL_HANDLER_INFO(SIGHNDLRTYPE, INFO);                                \
     84         SAVE_OS_SIGNAL_HANDLER_CONTEXT(SIGHNDLRTYPE, CONTEXT);                          \
     85         SHM_WRITE_MEMORY_BARRIER;                                                       \
     86         stapi_signal_handler_oscontext[SIGHNDLRTYPE].sig_forwarded = TRUE;              \
     87         SHM_WRITE_MEMORY_BARRIER;                                                       \
     88         /* Need interlocked add below since multiple threads can execute this code      \
     89          * at the same time (e.g. if SIGALRM and SIGTERM happen at same time).          \
     90          * Just like SET_DEFERRED_TIMERS_CHECK_NEEDED macro in "have_crit.h",           \
     91          * we need to check the value before doing the interlocked add.                 \
     92          */                                                                             \
     93         if (!STAPI_IS_SIGNAL_HANDLER_DEFERRED(SIGHNDLRTYPE))                            \
     94                 INTERLOCK_ADD(&stapi_signal_handler_deferred, (1 << SIGHNDLRTYPE));     \
     95         SHM_WRITE_MEMORY_BARRIER;                                                       \
     96         SET_DEFERRED_STAPI_CHECK_NEEDED;                                                \
     97 }
  ```

* Threads 1 and 2 both went to line 93 and found `STAPI_IS_SIGNAL_HANDLER_DEFERRED(sig_hndlr_continue_handler)`
  FALSE. And then both went to proceed with the `INTERLOCK_ADD` call resulting in TWO increments for the
  same SIGCONT signal. This caused the value of the global variable `stapi_signal_handler_deferred` to
  have the bit following `sig_hndlr_continue_handler` set i.e. the `sig_hndlr_ctrlc_handler` bit (in the
  enum sequence). Thus we ended up with `stapi_signal_handler_deferred` having the least significant 4
  in its `0x404` value even though this process never had a Ctrl-C sent by the test (but SIGCONT could be sent
  by processes in the test at any time and not just one but more than one is also possible and is likely what
  happened in the test).

* While `INTERLOCK_ADD` is an interlocked operation, the decision to do it is based on checking the
  value of the global variable and that happens outside of the interlock and so the two together do
  not constitute an atomic operation like they should be.

Fix
---
* As part of e175d54, the global variables `deferred_signal_handling_needed`
  and `stapi_signal_handler_deferred` were updated to use `INTERLOCK_ADD` to prevent out-of-design values
  with multiple threads updating the same global variable at the same time.

* As part of a later commit 98e8638, `deferred_signal_handling_needed` was
  updated to use `COMPSWAP_LOCK` instead of `INTERLOCK_ADD`.

* But `stapi_signal_handler_deferred` continued to use `INTERLOCK_ADD`.

* The fix is simple and is to make `stapi_signal_handler_deferred` use `COMPSWAP_LOCK` instead of `INTERLOCK_ADD`.

* Towards this, the global variable now has a type of `global_latch_t`.

* A new `BIT_SET_INTERLOCKED` macro has been introduced in `sr_unix/sig_init.h`. This is based largely on
  the pre-existing `SET_DEFERRED_CONDITION` macro except that the new macro allows for the latch variable
  to be passed as a parameter.

  The following macros have been changed to use the new `BIT_SET_INTERLOCKED` macro.
  - STAPI_SET_SIGNAL_HANDLER_DEFERRED macro in sr_unix/sig_init.h
  - STAPI_CLEAR_SIGNAL_HANDLER_DEFERRED macro in sr_unix/sig_init.h
  - SET_DEFERRED_CONDITION macro in sr_port/have_crit.h
  - CLEAR_DEFERRED_CONDITION macro in sr_port/have_crit.h

* A new `BIT_GET_INTERLOCKED` macro gets the latch value. This is based on the pre-existing
  `GET_DEFERRED_CONDITION` macro in sr_port/have_crit.h.

  All places that need to check the value of the `stapi_signal_handler_deferred` global variable now use this
  new macro. the pre-existing `GET_DEFERRED_CONDITION` macro has also been changed to use this new macro.

* Unnecessary `GBLREF`s of `stapi_signal_handler_deferred` in a few files have been removed.

* Note that these changes are done based on a theory of what could have caused the failure. It is not yet
  known for sure if that is indeed the cause as the failure happens very rarely (it is not reproducible
  on demand even after running dozens of tests on lots of systems). Only time will tell if this fix is correct.

Misc changes
------------
* Noticed a longstanding typo in `sr_unix/ydb_stm_invoke_deferred_signal_handler.c` and fixed it.
  Not yet sure though if there is a user visible issue due to this.
nars1 added a commit that referenced this pull request Nov 17, 2023
…Simple Thread API application

Background
----------
* The `r126/ydb464` subtest failed in one rare run with the following failure symptom.

  ```diff
  > %YDB-F-ASSERT, Assert failed in sr_port/deferred_events_queue.c line 48 for expression (INTRPT_IN_EVENT_HANDLING == intrpt_ok_state)
  ```

* When this specific test was rerun around 10000 times, we saw around a dozen failures (with differing assert
  failures but all pointing to the same underlying issue) so this failure was reproducible but not easily.

Issue
-----
* Relevant details from the core file analysis is pasted below.

  ```c
  (gdb) thread apply all bt

  Thread 6 (Thread 0x7fa62a6c0640 (LWP 99885)):
    .
  #6  rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #7  set_events_from_signals (prev_intrpt_state=INTRPT_OK_TO_INTERRUPT) at sr_port/deferred_events_queue.c:37
  #8  xfer_set_handlers (event_type=11, param_val=939582496, popped_entry=0) at sr_port/deferred_events.c:190
  #9  generic_signal_handler (sig=2, info=0x7fa63a1b6fd8 <stapi_signal_handler_oscontext+3320>, context=0x7fa63a1b7058 <stapi_signal_handler_oscontext+3448>, is_os_signal_handler=1) at sr_unix/generic_signal_handler.c:305
  #10 ydb_os_signal_handler (sig=2, info=0x7fa625096bf0, context=0x7fa625096ac0) at sr_unix/ydb_os_signal_handler.c:88
  #11 <signal handler called>
  #12 __pthread_create_2_1 (newthread=<optimized out>, attr=<optimized out>, start_routine=<optimized out>, arg=<optimized out>) at ./nptl/pthread_create.c:835
  #13 pthread_create ()
  #14 runProc (tptoken=..., errstr=0x0, settings=..., curDepth=6) at simplethreadapi/inref/randomWalk.c:662
  #15 threadHelper (args=0x7fa62a6ba880) at simplethreadapi/inref/randomWalk.c:723
  #16 tpHelper (tptoken=..., errstr=0x7fa62a6ba850, tpfnparm=0x7fa62a6ba880) at simplethreadapi/inref/randomWalk.c:712
  #17 ydb_tp_st (tptoken=..., errstr=0x7fa62a6ba850, tpfn=0x55fa406e2d20 <tpHelper>, tpfnparm=0x7fa62a6ba880, transid=0x55fa406f69ea "BATCH", namecount=0, varnames=0x0) at sr_unix/ydb_tp_st.c:100
  #18 runProc (tptoken=..., errstr=0x0, settings=..., curDepth=5) at simplethreadapi/inref/randomWalk.c:642
  #19 threadHelper (args=0x7fa62a6bb7e0) at simplethreadapi/inref/randomWalk.c:723
    .
  #41 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

  Thread 1 (Thread 0x7fa61ef2e640 (LWP 7158)):
    .
  #6  rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99
  #7  xfer_reset_handlers (event_type=11) at sr_port/deferred_events.c:235
  #8  outofband_clear () at sr_port/outofband_clear.c:41
  #9  outofband_action (lnfetch_or_start=0) at sr_port/outofband_action.c:55
  #10 ydb_zwr2str_s (zwr=0x7fa61ef2d550, str=0x7fa61ef2d560) at sr_unix/ydb_zwr2str_s.c:55
  #11 ydb_zwr2str_st (tptoken=..., errstr=0x7fa61ef2d530, zwr=0x7fa61ef2d550, str=0x7fa61ef2d560) at sr_unix/ydb_zwr2str_st.c:40
  #12 runProc (tptoken=..., errstr=0x0, settings=..., curDepth=7) at simplethreadapi/inref/randomWalk.c:545
  #13 threadHelper (args=0x7fa62a6b9940) at simplethreadapi/inref/randomWalk.c:723
  #14 start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
  #15 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

  (gdb) p dollar_tlevel
  $4 = 6

  (gdb) p/x ydb_engine_threadsafe_mutex_holder[0]
  $14 = 0x7fa62a6c0640
  (gdb) p/x ydb_engine_threadsafe_mutex_holder[1]
  $15 = 0x7fa62a6c0640
  (gdb) p/x ydb_engine_threadsafe_mutex_holder[2]
  $16 = 0x7fa62a6c0640
  (gdb) p/x ydb_engine_threadsafe_mutex_holder[3]
  $17 = 0x7fa62a6c0640
  (gdb) p/x ydb_engine_threadsafe_mutex_holder[4]
  $18 = 0x7fa62a6c0640
  (gdb) p/x ydb_engine_threadsafe_mutex_holder[5]
  $19 = 0x7fa62a6c0640
  (gdb) p/x ydb_engine_threadsafe_mutex_holder[6]
  $20 = 0x7fa61ef2e640
  ```

* This is a case when signals are sent (SIGINT aka SIG-2 in this case) to a Simple Thread API process
  and one thread (`Thread 1` below) is running under the YottaDB engine lock already but the signal
  gets delivered to another thread (`Thread 6` below) and that incorrectly starts executing the signal
  handler which in turn invokes `xfer_set_handlers()` etc.. And so at the same time, 2 threads are
  executing YottaDB engine/runtime code although only one holds the lock. This is a no-no since YottaDB
  runtime logic is not multi-thread safe.

* From the above analysis, it is clear that the process was executing a TP transaction with `dollar_tlevel`
  equal to `6`.

  `Thread 6` had invoked `ydb_tp_st()` (in frame 17) which in turn invoked a callback function that created
  a new thread `Thread 1`.

  `Thread 6` held the YottaDB engine multi-thread lock for tlevels 0, 1, 2, 3, 4, 5.

  For tlevel 6, `Thread 1` held the YottaDB engine multi-thread lock.

* But the `SIGINT` signal (sent by the test) got sent to `Thread 6`. Therefore, it should have realized,
  while in `generic_signal_handler()`, that `dollar_tlevel` is 6 and it does not own the tlevel=6 lock
  (`Thread 1` owns it) and therefore should have done a `return` at line 404 below.

  ```c
     315 #define FORWARD_SIG_TO_MAIN_THREAD_IF_NEEDED(SIGHNDLRTYPE, SIG, IS_EXI_SIGNAL, INFO, CONTEXT)                                   \
       .
     332     if (simpleThreadAPI_active)                                                                                     \
     333     {                                                                                                               \
       .
     355             thisThreadId = pthread_self();                                                                          \
     356             assert(thisThreadId);                                                                                   \
     357             SET_YDB_ENGINE_MUTEX_HOLDER_THREAD_ID(mutexHolderThreadId, tLevel);                                     \
       .
     374             thisThreadIsMutexHolder = pthread_equal(mutexHolderThreadId, thisThreadId);                             \
       .
     386             if (!thisThreadIsMutexHolder                                                                            \
     387                             || (!IS_EXI_SIGNAL && (tLevel && (!isSigThreadDirected || signalForwarded))))           \
     388             {       /* Two possibilities.                                                                           \
       .
  -> 404                     return;                                                                                         \
     405             } else                                                                                                  \
  ```

* But clearly that did not happen (from the core file). Therefore, `thisThreadIsMutexHolder` (set at line
  374 above) should have been `TRUE`.

* How that happened can be seen in line 286 below inside the macro (invoked from line 357 above).

  ```c
    268 #define SET_YDB_ENGINE_MUTEX_HOLDER_THREAD_ID(HOLDER_THREAD_ID, TLEVEL)                                         \
    269 {                                                                                                               \
    270    GBLREF  uint4           dollar_tlevel;                                                                  \
    271    GBLREF  pthread_t       ydb_engine_threadsafe_mutex_holder[];                                           \
    272                                                                                                            \
    273    /* If not in TP, the YottaDB engine lock index is 0 (i.e. ydb_engine_threadsafe_mutex_holder[0] is      \
    274     * current lock holder thread if it is non-zero). But if we are in TP, then lock index could be         \
    275     * "dollar_tlevel"     : e.g. if a "ydb_get_st" call occurs inside of the "ydb_tp_st" call OR           \
    276     * "dollar_tlevel - 1" : if control is in the TP callback function inside "ydb_tp_st" but not a         \
    277     *      SimpleThreadAPI call like "ydb_get_st" etc.                                                     \
    278     */                                                                                                     \
    279    TLEVEL = dollar_tlevel; /* take a local copy of global variable as it could be concurrently changing */ \
    280    if (!TLEVEL)                                                                                            \
    281            HOLDER_THREAD_ID = ydb_engine_threadsafe_mutex_holder[0];                                       \
    282    else                                                                                                    \
    283    {                                                                                                       \
    284            HOLDER_THREAD_ID = ydb_engine_threadsafe_mutex_holder[TLEVEL];                                  \
    285            if (!HOLDER_THREAD_ID)                                                                          \
    286                    HOLDER_THREAD_ID = ydb_engine_threadsafe_mutex_holder[TLEVEL - 1];                      \
    287    }                                                                                                       \
    288 }
  ```

* Line 284 must have returned a value of 0 for `HOLDER_THREAD_ID` and so we went to line 286 and
  used the thread owner of tlevel=5 which was `Thread 6`.

* In the core file, we see that tlevel=6 lock owned is `Thread 1`. But at the time line 284 got executed,
  `Thread 1` was not owning the lock.

* That can be explained if `Thread 1` had not yet done the `ydb_zwr2str_st()` call when line 284 got
  executed.

* The issue then is that when we found no one holding the tlevel=6 lock, we went to see who holds the
  tlevel=5 lock and returned that thread is as the current YottaDB engine multi-thread lock holder.

* This is where the issue is. `Thread 1` even though it had not yet attempted to get the lock, owns
  the lock at this point since `Thread 6` has invoked the callback function and has no control of
  what calls the callback function can invoke (including creating new threads that in turn do
  Simple Thread API calls on their own like happened with `Thread 1`).

* Treating `Thread 6` as owning the lock ended up with a situation where 2 threads think they each
  own the engine lock and run YottaDB code at the same time causing the assert failures.

* This issue is long standing (started in 2afcbd2, which was committed 2019/03/25) but it manifests
  as assert failures only after the GT.M V7.0-001 code merge. That is because the deferred event queue
  handling got reworked in V7.0-001 making it possible for more logic to execute while in the
  signal handler thereby exposing this long standing issue.

* Note that even then it has taken a few months of testing to show this one failure in a C program that
  invokes multiple threads. So it is really a rare issue.

Fix
---
* The fix is thankfully simple and is to remove lines 285-286 above. That is, check the lock
  holder for the tlevel which `dollar_tlevel` global currently points to. Do not go one before that
  if we find the top level not being held by any thread.

* With this change, `Thread 6` will not incorrectly conclude it is the owner. This is because it will
  find that the owner of the YottaDB engine lock is no thread in this case and since that does not
  match its own thread id, it will `return` if it gets delivered the SIGINT (after noting down the
  fact that this signal handling was deferred) and the next thread that runs YottaDB runtime logic
  will notice this happened and handle the signal while it holds the engine lock.

Notes
-----
* Since this issue is very unlikely to be seen in practice (needs a Simple Thread API application that
  creates threads while inside a `ydb_tp_st()` call and also sends SIGINT signals), no YDB issue is
  created for this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants