Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPRESTART syslog messages identify correct global name and message type #207

Closed
nars1 opened this issue Apr 24, 2018 · 0 comments
Closed
Assignees
Milestone

Comments

@nars1
Copy link
Collaborator

nars1 commented Apr 24, 2018

Final Release Note

Type 0 and type 3 TPRESTART messages in the syslog (enabled by turning on TP restart logging) correctly report the global variable causing the restart. Furthermore, type 3 messages correctly identify themselves as type 3 messages. Previously, they could report an incorrect global variable reference, and type 3 messages were sometimes incorrectly identifed as type 2. (#207)

Description

Below is a test case that demonstrates the issue.

> setenv ydb_gbldir mumps.gld
> rm -f mumps.gld
> $ydb_dist/mumps -run GDE @tmp.com
> rm *.dat
> $ydb_dist/mupip create
> setenv gtm_tprestart_log_delta 1
> $ydb_dist/mumps -run x

If you watch the syslog while the above command runs, you will see good and bad messages like demonstrated below.

Good message --> ^a is global name and database is a.dat which match
Apr 24 11:01:57 xxxxx YDB-MUMPS[2634]: %YDB-I-TPRESTART, Database tmp/a.dat; code: L; blk: 0x00000003 in glbl: ^a(1979); pvtmods: 0, blkmods: 1, blklvl: 0, type: 4, readset: 1, writeset: 0, local_tn: 0x0000000000000294, zpos: child+4^x

Bad message --> ^b is global name and database is a.dat which do not match
Apr 24 11:01:57 xxxxx YDB-MUMPS[2638]: %YDB-I-TPRESTART, Database tmp/a.dat; code: LL; blk: 0x00000003 in glbl: ^b(1981); pvtmods: 1, blkmods: 1, blklvl: 0, type: 3, readset: 2, writeset: 1, local_tn: 0x000000000000029A, zpos: child+5^x

Below are the files needed to run the above test

> cat tmp.com
change -segment DEFAULT -file=mumps.dat
add -name a -region=areg
add -region areg -dyn=aseg
add -segment aseg -file=a
add -name b -region=breg
add -region breg -dyn=bseg
add -segment bseg -file=b
add -name c -region=creg
add -region creg -dyn=cseg
add -segment cseg -file=c

> cat x.m
x
 set ^stop=0,njobs=5
        for i=1:1:njobs do
        . set jobstr="job child^x:(output=""child_x.mjo"_i_""":error=""child_x.mje"_i_""")"
        . xecute jobstr
        . set job(i)=$zjob
 ;
 ; let child run for 10 seconds
 hang 10
 ;
 ; signal child processes to stop
 set ^stop=1
 ;
 ; wait for child processes to die
        for i=1:1:njobs set pid=job(i) for  quit:'$zgetjpi(pid,"ISPROCALIVE")  hang 0.1
        quit
child  ;
 for i=1:1:1000 do
 . tstart ():serial
 . set x=$incr(^c)
 . set ^a(x)=1,^b(x)=2
 . tcommit
 quit

Draft Release Note

The TPRESTART message (which is recorded in the syslog when restarts happen in a TP transaction and the environment variable ydb_tprestart_log_delta/gtm_tprestart_log_delta is set to a non-zero value) indicates the correct global name that was involved in the restart. In prior versions, this could be incorrect some times (only possible when the TPRESTART syslog message had a "type:" of 0 or 3).

@nars1 nars1 added this to the r130 milestone Apr 24, 2018
@nars1 nars1 self-assigned this Apr 24, 2018
nars1 added a commit to nars1/YottaDB that referenced this issue Apr 24, 2018
… restart in tp_tend (where gv_currkey is not relevant)
nars1 added a commit to nars1/YottaDBtest that referenced this issue Apr 24, 2018
nars1 added a commit that referenced this issue Apr 25, 2018
…t in tp_tend (where gv_currkey is not relevant)
nars1 added a commit to YottaDB/YDBTest that referenced this issue Apr 25, 2018
@nars1 nars1 closed this as completed Apr 25, 2018
nars1 added a commit to nars1/YottaDB that referenced this issue May 3, 2018
* The TPRESTART syslog message relies on all cdb_sc_blkmod occurrences invoking the
  TP_TRACE_HIST_MOD macro. One case in gvcst_expand_key.h was missed out. Now fixed.
  The user-visible impact of this is that some L type of restarts in TP could
  print a potentially incorrect "type:" field because the type was not set as part of
  the current restart (but are using a leftover value of TREF(blkmod_fail_type) from a
  prior restart in the same process).

* In case a cdb_sc_blkmod happens in op_tcommit, it used to be classified as a
  tp_blkmod_t_qread which is misleading since this is not the usual invocation of
  "t_qread" before the TCOMMIT command. So make this a new type "tp_blkmod_op_tcommit".
  This is now recognized by tp_restart() to decide whether to print unsubscripted
  gvn or not in the TPRESTART syslog message. This would make the TPRESTART syslog
  message more fine-grained in terms of the type of restart.

* In tp_hist & tp_tend, the global variable TREF(tprestart_syslog_delta) was used in
  various places. This is now speeded up by copying this into a local variable.
  Additionally, a check of "1 != n_blkmods" is now made debug-only. The if block would
  have always been skipped for a pro build.

* Comments in tp_restart have been enhanced.
nars1 added a commit that referenced this issue May 3, 2018
* The TPRESTART syslog message relies on all cdb_sc_blkmod occurrences invoking the
  TP_TRACE_HIST_MOD macro. One case in gvcst_expand_key.h was missed out. Now fixed.
  The user-visible impact of this is that some L type of restarts in TP could
  print a potentially incorrect "type:" field because the type was not set as part of
  the current restart (but are using a leftover value of TREF(blkmod_fail_type) from a
  prior restart in the same process).

* In case a cdb_sc_blkmod happens in op_tcommit, it used to be classified as a
  tp_blkmod_t_qread which is misleading since this is not the usual invocation of
  "t_qread" before the TCOMMIT command. So make this a new type "tp_blkmod_op_tcommit".
  This is now recognized by tp_restart() to decide whether to print unsubscripted
  gvn or not in the TPRESTART syslog message. This would make the TPRESTART syslog
  message more fine-grained in terms of the type of restart.

* In tp_hist & tp_tend, the global variable TREF(tprestart_syslog_delta) was used in
  various places. This is now speeded up by copying this into a local variable.
  Additionally, a check of "1 != n_blkmods" is now made debug-only. The if block would
  have always been skipped for a pro build.

* Comments in tp_restart have been enhanced.
@ksbhaskar ksbhaskar changed the title TPRESTART syslog messages do not identify the correct global name TPRESTART syslog messages identify correct global name May 8, 2018
@ksbhaskar ksbhaskar changed the title TPRESTART syslog messages identify correct global name TPRESTART syslog messages identify correct global name and message type May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant