Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YottaDB Darwin Port: passed sockets between jobbed processes don't work #292

Open
shabiel opened this issue Jun 21, 2018 · 9 comments
Open

Comments

@shabiel
Copy link
Contributor

shabiel commented Jun 21, 2018

For example, the M-Web-Server won't work.

It previously worked on the last port to Darwin, in V6.2-002A.

Confirmed on two different Macs.

I will debug as time permits.

@nars1
Copy link
Collaborator

nars1 commented Jun 21, 2018

Not sure if it helps but maybe #275 is related.

@shabiel
Copy link
Contributor Author

shabiel commented Jun 21, 2018

Interesting. I want to compile the latest source code for YottaDB on my Linux machine and see if we have the same issue.

@shabiel shabiel closed this as completed Jun 21, 2018
@shabiel shabiel reopened this Jun 21, 2018
@shabiel
Copy link
Contributor Author

shabiel commented Jun 25, 2018

Shouldn't be a surprise, but I at least confirmed it's not an issue on Linux.

@nars1, any advice on debugging this? The multiple forks make it difficult. The way I debugged gtmshrsec on Cygwin was to put in sleeps and then run and attach to the process while it's sleeping.

@shabiel
Copy link
Contributor Author

shabiel commented Jun 25, 2018

Found the crash.

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff687cd4aa __kill + 10
1   libyottadb.dylib              	0x0000000105b68714 gtm_dump_core + 1332 (gtm_dump_core.c:69)
2   libyottadb.dylib              	0x0000000105b6d981 gtm_fork_n_core + 2241
3   libyottadb.dylib              	0x0000000105ae9ebb ch_cond_core + 475
4   libyottadb.dylib              	0x0000000105ea9d45 rts_error_va + 3333
5   libyottadb.dylib              	0x0000000105eaa307 rts_error_csa + 359
6   libyottadb.dylib              	0x0000000105e455d0 middle_child + 1168 (ojstartchild.c:187)
7   libyottadb.dylib              	0x0000000105eaa0b9 rts_error_va + 4217 (rts_error.c:160)
8   libyottadb.dylib              	0x0000000105eaa307 rts_error_csa + 359
9   libyottadb.dylib              	0x0000000105e3fd88 ojstartchild + 19000 (ojstartchild.c:612)
10  libyottadb.dylib              	0x0000000105e64c17 op_job + 4279 (op_job.c:190)
11  ???                           	0x000000010b9575b0 0 + 4489311664

Crashes here:

             SEND(setup_fds[0], &params, SIZEOF(params), 0, rc);
             if (rc < 0)
                 SETUP_DATA_FAIL();

Previous SENDs are apparently successful.

@shabiel
Copy link
Contributor Author

shabiel commented Jun 25, 2018

Okay. After an hour of debugging, it turns out it's crashing at random sends, which means that the grandchild process is crashing at the get-go and the sends that succeed just succeed accidentally.

@shabiel
Copy link
Contributor Author

shabiel commented Jun 25, 2018

I think I finally found the problem. I am doing the stepping of si into assembly so that I can catch it at the right time.

(lldb) process attach -n mumps -w
Process 61571 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGTRAP
    frame #0: 0x00007fff686dc666 libsystem_c.dylib`fork + 18
libsystem_c.dylib`fork:
->  0x7fff686dc666 <+18>: retq   
    0x7fff686dc667 <+19>: testl  %ebx, %ebx
    0x7fff686dc669 <+21>: je     0x7fff686dc67d            ; <+41>
    0x7fff686dc66b <+23>: cmpl   $-0x1, %ebx
Target 0: (mumps) stopped.

Executable module set to "/Users/sam/Documents/repos/YottaDB/build/./mumps".
Architecture set to: x86_64-apple-macosx.
(lldb) si
Process 61571 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007fff686dc666 libsystem_c.dylib`fork + 18
libsystem_c.dylib`fork:
->  0x7fff686dc666 <+18>: retq   
    0x7fff686dc667 <+19>: testl  %ebx, %ebx
    0x7fff686dc669 <+21>: je     0x7fff686dc67d            ; <+41>
    0x7fff686dc66b <+23>: cmpl   $-0x1, %ebx
Target 0: (mumps) stopped.
(lldb)  
Process 61571 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x000000010e43d0a0 libyottadb.dylib`intrpt_ok_state
libyottadb.dylib`intrpt_ok_state:
->  0x10e43d0a0 <+0>: sbbb   %al, (%rax)
    0x10e43d0a2 <+2>: addb   %al, (%rax)

libyottadb.dylib`mumps_status:
    0x10e43d0a4 <+0>: addl   %eax, (%rax)
    0x10e43d0a6 <+2>: addb   %al, (%rax)
Target 0: (mumps) stopped.
(lldb) si
Process 61571 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10e43d0a0)
    frame #0: 0x000000010e43d0a0 libyottadb.dylib`intrpt_ok_state
libyottadb.dylib`intrpt_ok_state:
->  0x10e43d0a0 <+0>: sbbb   %al, (%rax)
    0x10e43d0a2 <+2>: addb   %al, (%rax)

libyottadb.dylib`mumps_status:
    0x10e43d0a4 <+0>: addl   %eax, (%rax)
    0x10e43d0a6 <+2>: addb   %al, (%rax)
Target 0: (mumps) stopped.

@shabiel
Copy link
Contributor Author

shabiel commented Jun 25, 2018

More stuff from the same stack. I am puzzled actually by this. None of it makes sense.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10e43d0a0)
    frame #0: 0x000000010e43d0a0 libyottadb.dylib`intrpt_ok_state
    frame #1: 0x000000010e43c1f0 libyottadb.dylib`xfer_name + 2336
  * frame #2: 0x000000010db8779e libyottadb.dylib`ojstartchild(jparms=0x00007ffee2adfd00, argcnt=1, non_exit_return=0x00007ffee2adfdbc, pipe_fds=0x00007ffee2adfe68) at ojstartchild.c:389
    frame #3: 0x000000010dbb0c17 libyottadb.dylib`op_job(argcnt=1) at op_job.c:190
    frame #4: 0x000000010ed361a2
(lldb) f 0
frame #0: 0x000000010e43d0a0 libyottadb.dylib`intrpt_ok_state
libyottadb.dylib`intrpt_ok_state:
->  0x10e43d0a0 <+0>: sbbb   %al, (%rax)
    0x10e43d0a2 <+2>: addb   %al, (%rax)

libyottadb.dylib`mumps_status:
    0x10e43d0a4 <+0>: addl   %eax, (%rax)
    0x10e43d0a6 <+2>: addb   %al, (%rax)
(lldb) p prev_intrpt_state
error: use of undeclared identifier 'prev_intrpt_state'
(lldb) f 1
frame #1: 0x000000010e43c1f0 libyottadb.dylib`xfer_name + 2336
libyottadb.dylib`xfer_table:
    0x10e43c1f0 <+0>: addb   %dh, %al
    0x10e43c1f2 <+2>: pushq  %rdi
    0x10e43c1f3 <+3>: orl    $0x1, %eax
    0x10e43c1f8 <+8>: xorb   %dh, 0x12(%rbp)
(lldb) p prev_intrpt_state
error: use of undeclared identifier 'prev_intrpt_state'
(lldb) f 2
frame #2: 0x000000010db8779e libyottadb.dylib`ojstartchild(jparms=0x00007ffee2adfd00, argcnt=1, non_exit_return=0x00007ffee2adfdbc, pipe_fds=0x00007ffee2adfe68) at ojstartchild.c:389
   386 			rts_error_csa(CSA_ARG(NULL) VARLSTCNT(6) ERR_YDBDISTUNVERIF, 4, STRLEN(ydb_dist), ydb_dist,
   387 					gtmImageNames[image_type].imageNameLen, gtmImageNames[image_type].imageName);
   388 		FFLUSH(NULL);
-> 389 		FORK_RETRY(child_pid);
   390 		if (child_pid == 0)
   391 		{
   392 	        /* DEBUG */
(lldb) p prev_intrpt_state
(intrpt_state_t) $7 = INTRPT_OK_TO_INTERRUPT

@shabiel
Copy link
Contributor Author

shabiel commented Jun 25, 2018

One last thing, before I go to bed... I have had enough of this...

$rax is 0; $al is 0. So the error happens at dereferencing $rax.

@nars1
Copy link
Collaborator

nars1 commented Jun 25, 2018

@shabiel : Related to using gdb to debug these multiple process scenarios, the following commands are very useful. Setting them to one of the two possible values listed in each bullet below gives you the flexibility to get gdb to follow the child or the parent after a fork/exec as well as control whether the other one is suspended or detached (executes concurrently). Hope this helps.

  1. set follow-fork-mode child OR set follow-fork-mode parent
  2. set follow-exec-mode new OR set follow-exec-mode same
  3. set detach-on-fork off OR set detach-on-fork on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants