-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source server spin loops sometimes (instead of issuing FILTERNOTALIVE error) when external filter program terminates #264
Comments
nars1
added a commit
to nars1/YottaDB
that referenced
this issue
May 29, 2018
Not doing so could cause the source server to loop indefinitely in case the filter program terminates prematurely.
nars1
added a commit
that referenced
this issue
May 30, 2018
Not doing so could cause the source server to loop indefinitely in case the filter program terminates prematurely.
nars1
added a commit
to nars1/YottaDBtest
that referenced
this issue
May 31, 2018
The source server loops indefinitely with r1.22 x86_64 and arm linux with this revised testcase. But passes with the code fix for YottaDB#264. Whereas the previous test case used to fail rarely only on arm linux and pass always on x86_64 linux.
nars1
added a commit
to nars1/YottaDBtest
that referenced
this issue
May 31, 2018
The source server loops indefinitely with r1.22 x86_64 and arm linux with this revised testcase. But passes with the code fix for YottaDB#264. Whereas the previous test case used to fail rarely only on arm linux and pass always on x86_64 linux. The enhanced test case (which includes a hang in notalive.m) is done randomly so we exercise the previous test codepath too.
nars1
added a commit
to nars1/YottaDBtest
that referenced
this issue
May 31, 2018
The source server loops indefinitely with r1.22 x86_64 and arm linux with this revised testcase. But passes with the code fix for YottaDB#264. Whereas the previous test case used to fail rarely only on arm linux and pass always on x86_64 linux. The enhanced test case (which includes a hang in notalive.m) is done randomly so we exercise the previous test codepath too.
nars1
added a commit
to YottaDB/YDBTest
that referenced
this issue
May 31, 2018
The source server loops indefinitely with r1.22 x86_64 and arm linux with this revised testcase. But passes with the code fix for #264. Whereas the previous test case used to fail rarely only on arm linux and pass always on x86_64 linux. The enhanced test case (which includes a hang in notalive.m) is done randomly so we exercise the previous test codepath too.
nars1
added a commit
to nars1/YottaDB
that referenced
this issue
May 31, 2018
…BRT/SIG-6 errors The check for whether the temporary file name is longer than the allocated buffer can hold was incorrect. It was assuming 4 bytes (SIZEOF(uint4)) for the string that followed the hexadecimal process-id but the actual name had _XXXXXX following it which accounts for 7 bytes. For example, the actual temporary file name was DEFAULT_17fa_XXXXXX (where XXXXXX was replaced with a temporary string by "mkstemp") but the check was accounting only for DEFAULT_17fa_XXX. So there was a potential for a 3-byte overflow to go unnoticed. This logic is now reworked to use SNPRINTF to let us accurately know if the allocated buffer is not enough and if so issue a FILENAMETOOLONG error. In addition, the FILENAMETOOLONG error is now prefixed with a FILEPARSE error message which prints the entire path name that exceeded the 255 byte limit. Previously this only printed the absolute directory path without including the temporary file name. This could be potentially confusing to the user since they would see the directory path printed in the FILEPARSE error message is less than 255 characters but still an error is being issued saying it is longer than 255 characters.
nars1
added a commit
that referenced
this issue
May 31, 2018
…-6 errors The check for whether the temporary file name is longer than the allocated buffer can hold was incorrect. It was assuming 4 bytes (SIZEOF(uint4)) for the string that followed the hexadecimal process-id but the actual name had _XXXXXX following it which accounts for 7 bytes. For example, the actual temporary file name was DEFAULT_17fa_XXXXXX (where XXXXXX was replaced with a temporary string by "mkstemp") but the check was accounting only for DEFAULT_17fa_XXX. So there was a potential for a 3-byte overflow to go unnoticed. This logic is now reworked to use SNPRINTF to let us accurately know if the allocated buffer is not enough and if so issue a FILENAMETOOLONG error. In addition, the FILENAMETOOLONG error is now prefixed with a FILEPARSE error message which prints the entire path name that exceeded the 255 byte limit. Previously this only printed the absolute directory path without including the temporary file name. This could be potentially confusing to the user since they would see the directory path printed in the FILEPARSE error message is less than 255 characters but still an error is being issued saying it is longer than 255 characters.
nars1
added a commit
that referenced
this issue
May 31, 2018
1) / is not guaranteed to be last byte in tempdir_full.addr in various cases. So the / that was removed from the prior #267 commit is inserted back. 2) Previously, tempdir_full.addr held the fully constructed name but it no longer does after the prior #267 commit. So tempdir_full.addr usages after tempfilename has been constructed need to be replaced with tempfilename. 3) One use of orig_buff_size was replaced with SIZEOF(tempfilename) since the latter is what got used as the maximum size in the SNPRINTF call. Both the values hold the same value so there is no issue but still it is better/safer to be consistent.
nars1
added a commit
to nars1/YottaDB
that referenced
this issue
Jun 1, 2018
…le name to keep its length at 4-bytes even if the process_id is less than 2**24
nars1
added a commit
to nars1/YottaDB
that referenced
this issue
Jun 1, 2018
nars1
added a commit
that referenced
this issue
Jun 1, 2018
…; pid is a 4-byte number so need 8x (not 4x) in SNPRINTF format string * Zero pad the hexadecimal pid in the backup temporary file name to keep its length at 4-bytes even if the process_id is less than 2**24 * 4-byte pid translates to 8 hexadecimal digits so need 8x instead of 4x in the SNPRINTF format string. * Remove duplication of the format string by moving it to a macro TEMP_FILE_FMT_STRING. This fixes a few differences in the format string used in the two places.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Final Release Note
Description
When the replication source server is started with an external filter M program and that filter terminates prematurely, it is possible for the source server to go into a spin loop where it invokes the read() system call in repl_filter_recv_line(). In this case, the read() would return a length of 0 (to indicate EOF in the pipe since the filter program that writes to the pipe is gone) but that is not currently handled correctly so we loop back to do the read(). In some cases (if we have sent a full transaction), we start a timer for $ydb_repl_filter_timeout seconds (default of 64 seconds) and that fortunately terminates the loop but if we have not yet sent a full transaction (e.g. the pipe is full and the filter needs to read some data for us to send more), the source server will loop eternally.
Draft Release Note
The replication source server correctly issues a FILTERNOTALIVE error in case it was started with an external filter (using the -FILTER qualifier) that terminates abnormally. Previously, it was possible for the source server to spin-loop waiting for a response from the long-dead filter program.
The text was updated successfully, but these errors were encountered: