%HD works on hexadecimal numbers with a 0x or 0X prefix #44

nars1 · 2017-10-04T17:04:51Z

Final Release Note

The %HD utility program accepts hexadecimal numbers with a 0x or 0X prefix. (YDB#44)

Description

%HD does not work on hexadecimal numbers which have a 0x prefix e.g. $$FUNC^%HD("A") returns 10, but $$FUNC^%HD("0xA") returns 0. It would be nice to have it return 10 even in the second case. Particularly because hexadecimal numbers returned for example by $$^%PEEKBYNAME have a 0x prefix in them. It is easier to pass them directly to %HD instead of having an intermediate step to strip out the 0x prefix.

Draft Release Note

%HD works even on input hexadecimal numbers that have a 0x or 0X prefix.

…a 0x prefix YottaDB#44

…a 0x or 0X prefix (#44)

… as it can instead of returning 0 This ensures mpt_extra/conv subtest passes for cases added as part of YottaDB#44. As part of GTM-5574 changes in GT.M V6.3-005, the conversion function was changed to return 0 in case only a substring of the input number was valid. Whereas YottaDB had previously enhanced all the numeric conversion routines to convert as much of the input number as possible. Continuing in that trend, the new %CONVBASEUTIL M program is now enhanced to convert as much as it can.

… as it can instead of returning 0 This ensures mpt_extra/conv subtest passes for cases added as part of #44. As part of GTM-5574 changes in GT.M V6.3-005, the conversion function was changed to return 0 in case only a substring of the input number was valid. Whereas YottaDB had previously enhanced all the numeric conversion routines to convert as much of the input number as possible. Continuing in that trend, the new %CONVBASEUTIL M program is now enhanced to convert as much as it can.

…ops while a V4 block is dirty and SimpleThreadAPI is active The simplethreadapi/lockst subtest failed as follows when the test framework randomly chose V4 format blocks. %YDB-F-MAXRTSERRDEPTH Error loop detected - aborting image with core And produced a core with the following C-stack #0 __pthread_kill (threadid=<optimized out>, signo=3) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57 #1 gtm_dump_core () at sr_unix/gtm_dump_core.c:72 #2 send_msg_va (csa=0x0, arg_count=9, var=0x7f25a0f7ff40) at sr_unix/send_msg.c:123 #3 send_msg (arg_count=9) at sr_unix/send_msg.c:74 #4 gtm_assert2 (condlen=17, condtext=0x7f25a62dc0d6 "!timer_in_handler", file_name_len=45, file_name=0x7f25a62dc0a8 "sr_unix/send_msg.c", line_no=125) at sr_port/gtm_assert2.c:34 #5 send_msg_va (csa=0x0, arg_count=9, var=0x7f25a0f80570) at sr_unix/send_msg.c:125 #6 send_msg (arg_count=9) at sr_unix/send_msg.c:74 #7 gtm_assert2 (condlen=17, condtext=0x7f25a62dc0d6 "!timer_in_handler", file_name_len=45, file_name=0x7f25a62dc0a8 "sr_unix/send_msg.c", line_no=125) at sr_port/gtm_assert2.c:34 . . #32 send_msg_va (csa=0x0, arg_count=9, var=0x7f25a0f83d20) at sr_unix/send_msg.c:125 #33 send_msg (arg_count=9) at sr_unix/send_msg.c:74 #34 gtm_assert2 (condlen=17, condtext=0x7f25a6305658 "!timer_in_handler", file_name_len=51, file_name=0x7f25a6305320 "sr_port/gtm_malloc_src.h", line_no=685) at sr_port/gtm_assert2.c:34 #35 gtm_malloc (size=4096) at sr_port/gtm_malloc_src.h:685 #36 wcs_wtstart (region=0x5558a88e1170, writes=0, cr_list_ptr=0x0, cr2flush=0x0) at sr_unix/wcs_wtstart.c:538 #37 wcs_stale (tid=93839273365872, hd_len=8, region=0x5558a8979628) at sr_port/t_end_sysops.c:1387 #38 timer_handler (why=14) at sr_unix/gt_timers.c:815 #39 <signal handler called> #40 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5558a88e2ea8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #41 __pthread_cond_wait_common (abstime=0x0, mutex=0x5558a88e2e40, cond=0x5558a88e2e80) at pthread_cond_wait.c:502 #42 __pthread_cond_wait (cond=0x5558a88e2e80, mutex=0x5558a88e2e40) at pthread_cond_wait.c:655 #43 ydb_stm_thread (parm=0x0) at sr_unix/ydb_stm_thread.c:92 #44 start_thread (arg=0x7f25a0f85700) at pthread_create.c:463 #45 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The primary issue is failure in the assert "assert(!timer_in_handler)" inside the PTHREAD_MUTEX_LOCK_IF_NEEDED macro at frame # 35 (gtm_malloc). This is because we are about to make a pthread_mutex_lock() call which should not be done while inside a signal handler (according to the man pages). To fix this issue, we skip flushing that particular cache-record in wcs_wtstart() thereby avoiding the need to call gtm_malloc() (and in turn the pthread_mutex_lock() call) while inside the timer handler.

…that can fail in rare cases * The below assert failed once in 1500 r126/ydb464 subtest runs across a dozen boxes. ```c assert((1 != forced_exit) || GET_DEFERRED_EXIT_CHECK_NEEDED); #27 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99 #28 wcs_wtstart (region=0x7f6970, writes=0, cr_list_ptr=0x0, cr2flush=0x0) at sr_unix/wcs_wtstart.c:830 #29 wcs_stale (tid=8350064, hd_len=8, region=0x872fe8) at sr_port/t_end_sysops.c:1386 #30 timer_handler (why=14, info=0x7fadda7caea8 <stapi_signal_handler_oscontext+10728>, context=0x7fadda7caf28 <stapi_signal_handler_oscontext+10856>) at sr_unix/gt_timers.c:773 #31 ydb_os_signal_handler (sig=14, info=0x7ffdee359eb0, context=0x7ffdee359d80) at sr_unix/ydb_os_signal_handler.c:66 #32 <signal handler called> #33 generic_signal_handler (sig=2, info=0x7fadda7c9588 <stapi_signal_handler_oscontext+4296>, context=0x7fadda7c9608 <stapi_signal_handler_oscontext+4424>) at sr_unix/generic_signal_handler.c:244 #34 ydb_os_signal_handler (sig=2, info=0x7ffdee35a870, context=0x7ffdee35a740) at sr_unix/ydb_os_signal_handler.c:88 #35 <signal handler called> #36 clock_nanosleep () from /usr/lib64/libc.so.6 #37 m_usleep (useconds=1000) at sr_unix/sleep.c:28 #38 wcs_sleep (sleepfactor=1) at sr_port/wcs_sleep.c:28 #39 wcs_phase2_commit_wait (csa=0x7f4840, cr=0x7fadd7b88a50) at sr_port/wcs_phase2_commit_wait.c:268 #40 t_qread (blk=6, cycle=0x7ffdee35c568, cr_out=0x7ffdee35c558) at sr_port/t_qread.c:668 #41 gvcst_search (pKey=0x7f6040, pHist=0x0) at sr_port/gvcst_search.c:438 #42 gvcst_put2 (val=0x7fadda7d3940 <increment_delta_mval>, parms=0x7ffdee35f8b0) at sr_port/gvcst_put.c:980 #43 gvcst_put (val=0x7fadda7d3940 <increment_delta_mval>) at sr_port/gvcst_put.c:299 #44 gvcst_incr (increment=0x7ffdee362030, result=0x7ffdee362050) at sr_port/gvcst_incr.c:60 #45 op_gvincr (increment=0x7ffdee362030, result=0x7ffdee362050) at sr_port/op_gvincr.c:60 #46 ydb_incr_s (varname=0x7ffdee362b10, subs_used=0, subsarray=0x7ffdee3629d0, increment=0x7ffdee362aa0, ret_value=0x7ffdee362a90) at sr_unix/ydb_incr_s.c:156 #47 runProc (settings=0x7ffdee363490, curDepth=0) at /usr/library/gtm_test/T998/simpleapi/inref/randomWalk.c:447 #48 runProc_driver (settings=0x7ffdee363490) at /usr/library/gtm_test/T998/simpleapi/inref/randomWalk.c:145 #49 main () at /usr/library/gtm_test/T998/simpleapi/inref/randomWalk.c:93 (gdb) f 28 #28 wcs_wtstart (region=0x7f6970, writes=0, cr_list_ptr=0x0, cr2flush=0x0) at sr_unix/wcs_wtstart.c:830 830 ENABLE_INTERRUPTS(INTRPT_IN_WCS_WTSTART, prev_intrpt_state); (gdb) p forced_exit $5 = 1 (gdb) p deferred_signal_handling_needed $6 = 0 ``` * This is because the signal handler for SIGALRM (sig=14) happened after line 183 but before line 193. ```c sr_port/have_crit.h ------------------- 183 forced_exit = 1; \ 184 if (in_os_signal_handler) \ 185 { /* If we are inside an OS signal handler and therefore had to defer exit \ 186 * handling, set "outofband" also to TRUE as this is checked by lots of \ 187 * potentially long-running commands in the runtime (e.g. HANG etc.) and we \ 188 * want all of those to automatically trigger process exit handling. \ 189 */ \ 190 outofband = deferred_signal; \ 191 } \ 192 /* Whenever "forced_exit" gets set to 1, set the corresponding deferred event too */ \ 193 SET_DEFERRED_EXIT_CHECK_NEEDED; \ ``` * We cannot easily protect against this situation and so it is better to remove the assert.

… failure in a Simple Threaded API application Background ---------- * The ideminter_rolrec/interrupted_rollback_or_recover subtest failed on a RHEL8 x86_64 in-house system as well as an AARCH64 system with the following symptom. ```diff 107a108,239 > ideminter_rolrec_0/interrupted_rollback_or_recover/savedjoboutput_0_20220107_233728/impjob_imptp0.mje4 > %YDB-F-MAXRTSERRDEPTH Error loop detected - aborting image with core ``` * The C-stack of the core file had the following (frames with repeating traces not shown below) ```c #0 pthread_kill () from /usr/lib64/libpthread.so.0 #1 gtm_dump_core () at sr_unix/gtm_dump_core.c:74 #2 rts_error_va (csa=0x0, argcnt=7, var=0x7ffc809d7070) at sr_unix/rts_error.c:144 #3 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99 #4 ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95 . . #38 rts_error_va (csa=0x0, argcnt=7, var=0x7ffc809dadb0) at sr_unix/rts_error.c:163 #39 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99 #40 ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95 #41 deferred_signal_handler () at sr_port/deferred_signal_handler.c:64 #42 eintr_handling_check () at sr_port/eintr_handling_check.c:29 #43 gtm_fprintf (stream=0x7f7f6c14b600 <_IO_2_1_stderr_>, format=0x7f7f6d7e1005 "%s") at sr_unix/gtm_stdio.c:75 #44 util_out_print_vaparm (message=0x0, flush=1, var=0x7ffc809dbb20, faocnt=2147483647) at sr_unix/util_output.c:873 #45 util_out_print (message=0x0, flush=1) at sr_unix/util_output.c:911 #46 util_cond_flush () at sr_unix/util_output.c:952 #47 ch_cond_core () at sr_unix/ch_cond_core.c:76 #48 rts_error_va (csa=0x0, argcnt=7, var=0x7ffc809dbd20) at sr_unix/rts_error.c:198 #49 rts_error_csa (csa=0x0, argcnt=7) at sr_unix/rts_error.c:99 #50 ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95 #51 deferred_signal_handler () at sr_port/deferred_signal_handler.c:64 #52 grab_lock (reg=0xe8e960, is_blocking_wait=1, onln_rlbk_action=2) at sr_unix/grab_lock.c:111 #53 tp_tend () at sr_port/tp_tend.c:1261 #54 op_tcommit () at sr_port/op_tcommit.c:494 #55 stkok3 () at sr_x86_64/opp_tcommit.s:30 (gdb) f 50 #50 0x00007f7f6d51efb2 in ydb_stm_invoke_deferred_signal_handler () at sr_unix/ydb_stm_invoke_deferred_signal_handler.c:95 95 assert((SIGINT == stapi_signal_handler_oscontext[sig_hndlr_ctrlc_handler].sig_num) || USING_ALTERNATE_SIGHANDLING); (gdb) list 93 if (STAPI_IS_SIGNAL_HANDLER_DEFERRED(sig_hndlr_ctrlc_handler)) 94 { 95 --> assert((SIGINT == stapi_signal_handler_oscontext[sig_hndlr_ctrlc_handler].sig_num) || USING_ALTERNATE_SIGHANDLING); 96 ydb_stm_invoke_deferred_signal_handler_type = sig_hndlr_continue_handler; 97 ctrlc_handler(DUMMY_SIG_NUM, NULL, NULL); 98 ydb_stm_invoke_deferred_signal_handler_type = sig_hndlr_none; 99 } (gdb) p/x stapi_signal_handler_deferred $15 = 0x404 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_none].sig_num $3 = 0 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_continue_handler].sig_num $4 = 18 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_ctrlc_handler].sig_num $5 = 0 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_dbcertify_signal_handler].sig_num $6 = 0 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_generic_signal_handler].sig_num $7 = 0 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_jobexam_signal_handler].sig_num $8 = 0 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_jobinterrupt_event].sig_num $9 = 0 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_job_term_handler].sig_num $10 = 0 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_op_fnzpeek_signal_handler].sig_num $11 = 0 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_suspsigs_handler].sig_num $12 = 0 (gdb) p stapi_signal_handler_oscontext[sig_hndlr_timer_handler].sig_num $13 = 14 (gdb) p simpleThreadAPI_active $1 = 1 ``` * As can be seen from the gdb analysis of frame 50, the global variable `stapi_signal_handler_deferred` has 2 bits set. The ones corresponding to `sig_hndlr_ctrlc_handler` and `sig_hndlr_timer_handler`. But the `stapi_signal_handler_oscontext[]` global variable indicates that the 2 signals that were deferred corresponded to `sig_hndlr_continue_handler` and `sig_hndlr_timer_handler`. This disconnect between `sig_hndlr_ctrlc_handler` and `sig_hndlr_continue_handler` across the 2 global variables contributed to the assert failure. Issue ----- * After a code review, I have a theory on the cause of the issue. And that is the below macro got invoked by multiple threads at the same time for 2 SIGCONT signals (multiple threads possible since this is a Simple Thread API application as seen by the non-zero value of `simpleThreadAPI_active` in gdb). **sr_unix/sig_init.h** ```c 80 #define STAPI_SET_SIGNAL_HANDLER_DEFERRED(SIGHNDLRTYPE, SIG_NUM, INFO, CONTEXT) \ 81 { \ 82 SAVE_OS_SIGNAL_HANDLER_SIGNUM(SIGHNDLRTYPE, SIG_NUM); \ 83 SAVE_OS_SIGNAL_HANDLER_INFO(SIGHNDLRTYPE, INFO); \ 84 SAVE_OS_SIGNAL_HANDLER_CONTEXT(SIGHNDLRTYPE, CONTEXT); \ 85 SHM_WRITE_MEMORY_BARRIER; \ 86 stapi_signal_handler_oscontext[SIGHNDLRTYPE].sig_forwarded = TRUE; \ 87 SHM_WRITE_MEMORY_BARRIER; \ 88 /* Need interlocked add below since multiple threads can execute this code \ 89 * at the same time (e.g. if SIGALRM and SIGTERM happen at same time). \ 90 * Just like SET_DEFERRED_TIMERS_CHECK_NEEDED macro in "have_crit.h", \ 91 * we need to check the value before doing the interlocked add. \ 92 */ \ 93 if (!STAPI_IS_SIGNAL_HANDLER_DEFERRED(SIGHNDLRTYPE)) \ 94 INTERLOCK_ADD(&stapi_signal_handler_deferred, (1 << SIGHNDLRTYPE)); \ 95 SHM_WRITE_MEMORY_BARRIER; \ 96 SET_DEFERRED_STAPI_CHECK_NEEDED; \ 97 } ``` * Threads 1 and 2 both went to line 93 and found `STAPI_IS_SIGNAL_HANDLER_DEFERRED(sig_hndlr_continue_handler)` FALSE. And then both went to proceed with the `INTERLOCK_ADD` call resulting in TWO increments for the same SIGCONT signal. This caused the value of the global variable `stapi_signal_handler_deferred` to have the bit following `sig_hndlr_continue_handler` set i.e. the `sig_hndlr_ctrlc_handler` bit (in the enum sequence). Thus we ended up with `stapi_signal_handler_deferred` having the least significant 4 in its `0x404` value even though this process never had a Ctrl-C sent by the test (but SIGCONT could be sent by processes in the test at any time and not just one but more than one is also possible and is likely what happened in the test). * While `INTERLOCK_ADD` is an interlocked operation, the decision to do it is based on checking the value of the global variable and that happens outside of the interlock and so the two together do not constitute an atomic operation like they should be. Fix --- * As part of e175d54, the global variables `deferred_signal_handling_needed` and `stapi_signal_handler_deferred` were updated to use `INTERLOCK_ADD` to prevent out-of-design values with multiple threads updating the same global variable at the same time. * As part of a later commit 98e8638, `deferred_signal_handling_needed` was updated to use `COMPSWAP_LOCK` instead of `INTERLOCK_ADD`. * But `stapi_signal_handler_deferred` continued to use `INTERLOCK_ADD`. * The fix is simple and is to make `stapi_signal_handler_deferred` use `COMPSWAP_LOCK` instead of `INTERLOCK_ADD`. * Towards this, the global variable now has a type of `global_latch_t`. * A new `BIT_SET_INTERLOCKED` macro has been introduced in `sr_unix/sig_init.h`. This is based largely on the pre-existing `SET_DEFERRED_CONDITION` macro except that the new macro allows for the latch variable to be passed as a parameter. The following macros have been changed to use the new `BIT_SET_INTERLOCKED` macro. - STAPI_SET_SIGNAL_HANDLER_DEFERRED macro in sr_unix/sig_init.h - STAPI_CLEAR_SIGNAL_HANDLER_DEFERRED macro in sr_unix/sig_init.h - SET_DEFERRED_CONDITION macro in sr_port/have_crit.h - CLEAR_DEFERRED_CONDITION macro in sr_port/have_crit.h * A new `BIT_GET_INTERLOCKED` macro gets the latch value. This is based on the pre-existing `GET_DEFERRED_CONDITION` macro in sr_port/have_crit.h. All places that need to check the value of the `stapi_signal_handler_deferred` global variable now use this new macro. the pre-existing `GET_DEFERRED_CONDITION` macro has also been changed to use this new macro. * Unnecessary `GBLREF`s of `stapi_signal_handler_deferred` in a few files have been removed. * Note that these changes are done based on a theory of what could have caused the failure. It is not yet known for sure if that is indeed the cause as the failure happens very rarely (it is not reproducible on demand even after running dozens of tests on lots of systems). Only time will tell if this fix is correct. Misc changes ------------ * Noticed a longstanding typo in `sr_unix/ydb_stm_invoke_deferred_signal_handler.c` and fixed it. Not yet sure though if there is a user visible issue due to this.

nars1 added this to the r110 milestone Oct 4, 2017

nars1 self-assigned this Oct 4, 2017

nars1 changed the title ~~%HD (percent utility to do decimal to hexadecimal conversion) should even work on hexadecimal numbers with a 0x prefix~~ %HD should also work on hexadecimal numbers with a 0x prefix Oct 4, 2017

nars1 added a commit to nars1/YottaDB that referenced this issue Oct 4, 2017

[NARS1] [ksbhaskar] %HD should also work on hexadecimal numbers with …

f8eecbd

…a 0x prefix YottaDB#44

nars1 added a commit that referenced this issue Oct 4, 2017

[NARS1] [ksbhaskar] %HD should also work on hexadecimal numbers with …

d167deb

…a 0x or 0X prefix (#44)

nars1 closed this as completed Oct 5, 2017

ksbhaskar added the enhancement label Oct 25, 2017

ksbhaskar changed the title ~~%HD should also work on hexadecimal numbers with a 0x prefix~~ %HD works on hexadecimal numbers with a 0x prefix Oct 25, 2017

ksbhaskar changed the title ~~%HD works on hexadecimal numbers with a 0x prefix~~ %HD works on hexadecimal numbers with a 0x or 0X prefix Oct 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

%HD works on hexadecimal numbers with a 0x or 0X prefix #44

%HD works on hexadecimal numbers with a 0x or 0X prefix #44

nars1 commented Oct 4, 2017 •

edited by ksbhaskar

Loading

%HD works on hexadecimal numbers with a 0x or 0X prefix #44

%HD works on hexadecimal numbers with a 0x or 0X prefix #44

Comments

nars1 commented Oct 4, 2017 • edited by ksbhaskar Loading

Final Release Note

Description

Draft Release Note

nars1 commented Oct 4, 2017 •

edited by ksbhaskar

Loading