Skip to content

Commit

Permalink
[#679] Do not start source server heartbeat timer if process has alre…
Browse files Browse the repository at this point in the history
…ady started exiting

* As part of a prior commit (SHA 723688c) various functions that started a
  timer (`wcs_clean_dbsync()`, `wcs_stale()` etc.) were fixed to not start one if we have already started
  exit processing.

* One such timer function that should also have been fixed but was left out is `gtmsource_heartbeat_timer()`.
  We had an in-house test failure which failed an assert in `start_timer()` because `gtmsource_heartbeat_timer()`
  was being started while we had already started exit processing. Below is the C-stack of the failure for the record.

  ```c
  #0  __pthread_kill () at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
  #1  gtm_dump_core () at sr_unix/gtm_dump_core.c:74
  #2  gtm_fork_n_core () at sr_unix/gtm_fork_n_core.c:163
  #3  ch_cond_core () at sr_unix/ch_cond_core.c:80
  #4  rts_error_va () at sr_unix/rts_error.c:192
  #5  rts_error_csa () at sr_unix/rts_error.c:99
  #6  start_timer () at sr_unix/gt_timers.c:433
  #7  gtmsource_heartbeat_timer () at sr_unix/gtmsource_heartbeat.c:74
  #8  timer_handler () at sr_unix/gt_timers.c:889
  #9  ydb_os_signal_handler () at sr_unix/ydb_os_signal_handler.c:63
  #10 <signal handler called>
  #11 __GI___libc_write () at ../sysdeps/unix/sysv/linux/write.c:26
  #12 _IO_new_file_write () at fileops.c:1181
  #13 new_do_write () at libioP.h:948
  #14 _IO_new_file_xsputn () at fileops.c:1255
  #15 _IO_new_file_xsputn () at fileops.c:1197
  #16 __GI__IO_fwrite () at libioP.h:948
  #17 gtm_fwrite () at sr_port/eintr_wrappers.h:334
  #18 gtm_fprintf () at tdio.c:82
  #19 util_out_print_vaparm () at sr_nix/util_output.c:876
  #20 util_out_print () at sr_unix/util_output.c:914
  #21 gtm_putmsg_csa () at sr_unix/gtm_putmsg.c:73
  #22 gds_rundown () at sr_unix/gds_rundown.c:1060
  #23 gv_rundown () at sr_port/gv_rundown.c:122
  #24 mupip_exit_handler () at sr_unix/mupip_exit_handler.c:144
  #25 __run_exit_handlers () at exit.c:108
  #26 __GI_exit () at exit.c:139
  #27 gtm_image_exit () at sr_unix/gtm_image_exit.c:27
  #28 util_base_ch () at sr_port/util_base_ch.c:124
  #29 gtmsource_ch () at sr_port/gtmsource_ch.c:96
  #30 gtmsource_readfiles () at aDB/V999_R131/sr_unix/gtmsource_readfiles.c:2023
  #31 gtmsource_get_jnlrecs () attaDB/V999_R131/sr_unix/gtmsource_process_ops.c:980
  #32 gtmsource_process () at sr_unix/gtmsource_process.c:1546
  #33 gtmsource () at sr_unix/gtmsource.c:525
  #34 mupip_main () at sr_unix/mupip_main.
  #35 dlopen_libyottadb () at /Distri9_R131/sr_unix/dlopen_libyottadb.c:151
  #36 main () at sr_unix/mupip.c:22
  ```

* This failure is now fixed by checking `exit_handler_active` and if it is `TRUE` we skip starting this timer.
  • Loading branch information
nars1 committed Mar 12, 2021
1 parent 4e9ed08 commit a37022e
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions sr_unix/gtmsource_heartbeat.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* Copyright (c) 2006-2017 Fidelity National Information *
* Services, Inc. and/or its subsidiaries. All rights reserved. *
* *
* Copyright (c) 2018-2019 YottaDB LLC and/or its subsidiaries. *
* Copyright (c) 2018-2021 YottaDB LLC and/or its subsidiaries. *
* All rights reserved. *
* *
* This source code contains the intellectual property *
Expand Down Expand Up @@ -52,6 +52,7 @@ GBLREF boolean_t gtmsource_logstats;
GBLREF int gtmsource_log_fd;
GBLREF FILE *gtmsource_log_fp;
GBLREF gtmsource_state_t gtmsource_state;
GBLREF boolean_t exit_handler_active;

GBLDEF boolean_t heartbeat_stalled = TRUE;
GBLDEF repl_heartbeat_que_entry_t *repl_heartbeat_que_head = NULL;
Expand Down Expand Up @@ -108,8 +109,13 @@ int gtmsource_init_heartbeat(void)
* this code may have to be revisited. Also, modify the check in gtmsource_process (prev_now != (save_now = gtmsource_now))
* to be something like (hearbeat_period < difftime((save_now = gtmsource_now), prev_now)). Vinaya 2003, Sep 08
*/
start_timer((TID)gtmsource_heartbeat_timer, heartbeat_period * (uint8)NANOSECS_IN_SEC, gtmsource_heartbeat_timer, SIZEOF(heartbeat_period),
&heartbeat_period); /* start_timer expects time interval in nanoseconds, heartbeat_period is in seconds */
if (!exit_handler_active)
{
start_timer((TID)gtmsource_heartbeat_timer, heartbeat_period * (uint8)NANOSECS_IN_SEC,
gtmsource_heartbeat_timer, SIZEOF(heartbeat_period), &heartbeat_period);
/* start_timer expects time interval in nanoseconds, heartbeat_period is in seconds */
}
/* else: We are already in exit processing. Do not start timers as it is unsafe (YDB#679). */
REPL_DPRINT4("Started heartbeat timer with %d s\tSource now is %ld\tTime now is %ld\n",
heartbeat_period, gtmsource_now, time(NULL));
heartbeat_stalled = FALSE;
Expand Down

0 comments on commit a37022e

Please sign in to comment.