Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX: parallel/test-stdio-closed #8375

Closed
mhdawson opened this issue Sep 2, 2016 · 53 comments
Closed

AIX: parallel/test-stdio-closed #8375

mhdawson opened this issue Sep 2, 2016 · 53 comments
Labels
process Issues and PRs related to the process subsystem. test Issues and PRs related to the tests.

Comments

@mhdawson
Copy link
Member

mhdawson commented Sep 2, 2016

Version: master
Platform: AIX
Subsystem: io

parallel/test-stdio-closed fails on AIX, pulling this out of #7973 to avoid confusion since issues are not related.

not ok 855 parallel/test-stdio-closed # TODO : Fix flaky test
# 
# assert.js:85
#   throw new assert.AssertionError({
#   ^
# AssertionError: 1 == 42
#     at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-stdio-closed.js:25:10)
#     at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common.js:416:15)
#     at emitTwo (events.js:106:13)
#     at ChildProcess.emit (events.js:191:7)
#     at Process.ChildProcess._handle.onexit (internal/child_process.js:215:12)
  ---
  duration_ms: 0.378
  ...
@mhdawson mhdawson added the test Issues and PRs related to the tests. label Sep 2, 2016
@mscdex mscdex added the process Issues and PRs related to the process subsystem. label Sep 2, 2016
@Trott
Copy link
Member

Trott commented Sep 2, 2016

@nodejs/testing @Fishrock123

@Trott
Copy link
Member

Trott commented Sep 7, 2016

If it helps anyone troubleshoot or if it gives anyone an idea as to what's going on, the process.stdout.write() is failing (on AIX only) with:

EBADF: bad file descriptor, write

/cc gireeshpunathil who might have more information to add since their last comment.

@Trott
Copy link
Member

Trott commented Sep 7, 2016

Also, if it helps anyone, here's the modified version of the test I used to capture that info:

'use strict';
const common = require('../common');
const assert = require('assert');
const spawn = require('child_process').spawn;

if (common.isWindows) {
  common.skip('platform not supported.');
  return;
}

if (process.argv[2] === 'child') {
  try {
    process.stdout.write('stdout', function() {
      try {
        process.stderr.write('stderr', function() {
          process.exit(42);
        });
      } catch (e) {
        process.exit(200);
      }
    });
  } catch (e) {
    process.send(e.code);
    process.send(e.message);
    process.exit(20);
  }
  return;
}

// Run the script in a shell but close stdout and stderr.
const cmd = `"${process.execPath}" "${__filename}" child 1>&- 2>&-`;
const proc = spawn(
  '/bin/sh', ['-c', cmd], { stdio: ['ipc', 'inherit', 'inherit'] }
);

proc.on('message', (msg) => {
  console.log(msg);
});

proc.on('exit', common.mustCall(function(exitCode) {
  assert.strictEqual(exitCode, 42);
}));

Here's the results. I believe the EBADF is at the end because the test runner prints stderr first, then stdout. Probably should have used console.error() instead of console.log() but whatever.

# 
# assert.js:85
#   throw new assert.AssertionError({
#   ^
# AssertionError: 20 === 42
#     at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-stdio-closed.js:41:10)
#     at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common.js:416:15)
#     at emitTwo (events.js:106:13)
#     at ChildProcess.emit (events.js:191:7)
#     at Process.ChildProcess._handle.onexit (internal/child_process.js:215:12)
# EBADF
# EBADF: bad file descriptor, write

@gireeshpunathil
Copy link
Member

Thanks @Trott - this info will be really useful, I will investigate.

@gireeshpunathil
Copy link
Member

gireeshpunathil commented Sep 8, 2016

ok, putting things into perspective:

Three processes: a node parent spawns a non-node child (sh) which closes its streams (1 and 2) and then spawns a node child. The grandchild attempts to write to its stream, and then returns an error code.

In Linux, the grandchild node apparently succeeded in writing into stdout, while AIX failed with EBADF (fd closed | invalid)

@gireeshpunathil
Copy link
Member

So the question is, who closed the grandchild node's stream in AIX.

resorting to truss (AIX) and strace (Linux), I see that the parent shell does not perform fork and exec, instead only exec - means the closed streams of sh is inherited to the node grandchild:

Linux:

pid 18948] close(1)                    = 0
[pid 18948] execve("./node", ["./node", "k.js", "child"], [/* 30 vars */]) = 0

AIX:

3932586:        8913259: close(1)                               = 0
3932586:        8913259: execve("./node", 0x20010EF8, 0x20011428)  argc: 3

This is also visible if I add a timeout in the grandchild and examine the processes:

Linux:

bash-4.1$ ps
  PID TTY          TIME CMD
16440 pts/0    00:00:00 node
16446 pts/0    00:00:00 node
16457 pts/0    00:00:00 ps
17838 pts/0    00:00:00 csh
17855 pts/0    00:00:00 bash

AIX:

bash-4.3$ ps
     PID    TTY  TIME CMD
 5177436  pts/0  0:00 ps 
 5570730  pts/0  0:00 -ksh 
 3211596  pts/0  0:00 bash 
 2097736  pts/0  0:00 ./node k.js 
 2425576  pts/0  0:00 ./node k.js child 

As we can see, there is no /bin/sh in the list. That process itself exec'ed into node.

So, for the grandchild node, fd 1 is inherently closed.

@gireeshpunathil
Copy link
Member

I wrote a C grandchild and used the same test case to invoke it to see how it behaves:

bash-4.3$ cat foo.c
#include <stdio.h>
#include  <stdlib.h>
int main() {
  fprintf(stdout, "hello\n");
  exit(42);
}

Linux:

fstat(1, 0x7fff2453d760)                = -1 EBADF (Bad file descriptor)
write(1, "hello\n", 6)                  = -1 EBADF (Bad file descriptor)
exit_group(42)                          = ?

AIX:

3735848:        11272557: kioctl(1, 22528, 0x00000000, 0x00000000)              = 0 Err#9  EBADF
3735848:        11272557: kwrite(1, " h e l l o\n", 6)          Err#9  EBADF
23735848:  11272557: _exit(42)

Here Linux and AIX behaved in the same manner, and the reason is well understood - the grandchild's fd is closed at inception.

So, Linux node seem to have special method to handle closed standard streams: either re-incarnate them through /dev/null device, or suppress the error code from propagation. AIX may be missing this.

At this point, I want to question the Linux behavior: Functionally, a node process spawned with standard streams suppressed, should behave in a console-less manner, and attempts to write to them should err. Any thoughts?

@Trott
Copy link
Member

Trott commented Sep 8, 2016

@gireeshpunathil node.cc has code in it designed to detect if any of the stdio streams are closed and (if so) open them on /dev/null.

This test is designed specifically to test that code. And it seems that either that code is not running on AIX (maybe the fstat() call works differently) or else the open() call in that code is returning the correct file descriptor but (for some reason) that file descriptor is not writable.

Any ideas?

@gireeshpunathil
Copy link
Member

Thanks @Trott ! I was looking for such a location. I will now figure out what is happening out there in AIX.

@veshboo
Copy link

veshboo commented Sep 8, 2016

@gireeshpunathil What is the result without spawn on aix?

$ node -e "process.stdout.write('aaa')" 1>&-

@Trott (This is just guess) The code with fstat() is surrounded by #ifdef __POSIX__. Not defined in the aix build?

EDIT: I also found g++ ... '-D__POSIX__' ../src/node.cc line from CI console output. So forget about my guess.

@addaleax
Copy link
Member

addaleax commented Sep 8, 2016

The code with fstat() is surrounded by #ifdef __POSIX__. Not defined in the aix build?

__POSIX__ should be defined for all non-Windows platforms, according to node.gyp.

@gireeshpunathil
Copy link
Member

gireeshpunathil commented Sep 8, 2016

I tested that at first, but it succeeded all the time:
bash-4.3$ ./node -e "process.stdout.write('aaa')" 1>&-
bash-4.3$ echo $?
0
bash-4.3$

__POSIX__ is indeed defined in AIX, and I am live debugging into it at the moment. Will get to the bottom shortly, hopefully!

@gireeshpunathil
Copy link
Member

I guess I found the infliction point:

bash-4.3$ cat foo.c
#include <stdio.h>
#include <fcntl.h>

int main() {
struct stat ignored;
 int ret = fstat(1, &ignored);
fprintf(stderr, "%d\n", ret);
}

Linux:

bash-4.1$ sh -c ./a.out >&-
-1
bash-4.1$ 

AIX:

bash-4.3$ sh -c ./a.out 1>&-
0
bash-4.3$ 

Need to map it back into the problem context, and figure out how this led to the issue. Also need to dig into fstat manuals to understand the disparity.

@veshboo
Copy link

veshboo commented Sep 8, 2016

seems fstat() is there to check stdio fds open.
on aix it return 0, so the code will not open stdio on /dev/null. Then we write at still closed fd. results in EBADF.

@gireeshpunathil
Copy link
Member

ok, sounds reasonable explanation, thanks @veshboo!

@veshboo
Copy link

veshboo commented Sep 8, 2016

I recommend to have #include <sys/stat.h> for sure.

@addaleax
Copy link
Member

addaleax commented Sep 8, 2016

on aix it return 0

Sounds like that, but why?

@gireeshpunathil
Copy link
Member

will continue debugging tomorrow.

@Trott
Copy link
Member

Trott commented Sep 8, 2016

Regarding this working on AIX even though the test here is failing:

./node -e "process.stdout.write('aaa')" 1>&-

What about this instead?:

./node -e "process.stdout.write('aaa', () => { console.error('write was successful-ish'); })" 1>&- 

@Trott
Copy link
Member

Trott commented Sep 8, 2016

I wade into C++ observations with trepidation as I feel like a complete imposter on this stuff, but HERE WE GO:

  • Is it possible that behavior of fstat() on a closed stream is undefined, so AIX and Linux are both right to do what they do?
  • If so, perhaps the more robust thing to do is try out lseek() or fseek() or ftell() or something like that and see if we can make it work across POSIX platforms?

/cc @bnoordhuis who introduced both the node.cc code and the test in #875 as a means to prevent information leaks.

@Trott
Copy link
Member

Trott commented Sep 8, 2016

Also:

If the consensus turns out to be that this is a bug in AIX, then inserting code to skip the test on AIX (similar to the code already there that skips it for Windows) is a sufficient fix from my perspective.

EDIT: Upon further reflection, skipping might not be OK if it means the potential information leak is there. On the other hand, perhaps the information leak is there too on Windows so...
¯_(ツ)_/¯

@addaleax
Copy link
Member

addaleax commented Sep 8, 2016

Is it possible that behavior of fstat() on a closed stream is undefined, so AIX and Linux are both right to do what they do?

man 3posix fstat (aka http://pubs.opengroup.org/onlinepubs/009695399/functions/fstat.html, which I think is authoritative) says:

ERRORS
       The fstat() function shall fail if:

       EBADF  The fildes argument is not a valid file descriptor.

So, yes, that may be a bug in AIX then. I could also imagine that something in the background opens another FD before the fstat() call is executed? Like, the pthread operations or the setvbuf() calls in node_main.cc?

@bnoordhuis
Copy link
Member

@gireeshpunathil What kind of file descriptor does struct stat ignored say it is? It would be good to test that with both your C snippet and the code in node.cc.

@Trott
Copy link
Member

Trott commented Sep 9, 2016

@bnoordhuis I'm seeing AIX report a regular file (S_ISREG(ignored.st_mode) is true) when the fstat() unexpectedly succeeds on a file descriptor for a closed stderr.

@gireeshpunathil
Copy link
Member

I have machine access issues right now, hoping to resume my work soon.

@bnoordhuis
Copy link
Member

Well, I thought I was, but now, uh, yeah, not so much...

Perl does it for stdin but not stdout:

$ strace -e close,fstat,ioctl,open,read,write perl -e 'print "."' 0>&- 1>&-
# <shared library loading omitted>
open("/dev/urandom", O_RDONLY)          = 0
read(0, "\232~|\341", 4)                = 4
close(0)                                = 0
ioctl(0, TCGETS, 0x7ffe92faf220)        = -1 EBADF (Bad file descriptor)
ioctl(1, TCGETS, 0x7ffe92faf220)        = -1 EBADF (Bad file descriptor)
ioctl(2, TCGETS, {B9600 opost isig icanon echo ...}) = 0
open("/dev/null", O_RDONLY)             = 0
ioctl(0, TCGETS, 0x7ffe92faf340)        = -1 ENOTTY (Inappropriate ioctl for device)
fstat(0, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
write(1, ".", 1)                        = -1 EBADF (Bad file descriptor)
+++ exited with 0 +++

@Trott
Copy link
Member

Trott commented Sep 16, 2016

@bnoordhuis Do you have an opinion as to which of these is the right way to approach solving this issue?

  • Skip the test on AIX. This basically says: "Re-opening a closed stdio handle on /dev/null is a good feature, we like it, we want to keep it, but we're OK that there are platforms where the behavior may make this not possible."
  • Remove this code and the test and consider closed stdio streams to be the user's problem to solve. This basically says: "Eh, it was a nice thing we did, but it's not that important, other languages don't seem to do this, and we probably can't it working reliably across all platforms, so best to just not do it." That would probably be a semver-major change.
  • Power through. Get this working on AIX. I assume this is choice deps: update openssl to 1.0.1j #1, but what if it's not possible? Do you have an opinion on which of the above is the better approach?

@bnoordhuis
Copy link
Member

I think it's a good feature to have; I wrote the code for a reason. :-)

The reason to make sure fds 0-2 refer to something is that it's a potential data leak otherwise. Imagine an attacker manages to open a socket with fd 1 - they will receive everything that the application prints through console.log().

@Trott
Copy link
Member

Trott commented Sep 16, 2016

@bnoordhuis Would I be correct to conclude that if it turns out it can't be reliably implemented on AIX, then skipping the test on AIX would be your preference and leave the feature alone on other POSIX instances?

@bnoordhuis
Copy link
Member

Correct.

@Trott
Copy link
Member

Trott commented Sep 16, 2016

I suppose another option is to alter the test to confirm that if a stdio stream is closed on startup, that it can't be re-opened. It does mean accepting a behavior difference: Linux (for example) will silently discard writes to the "closed" stream (which will really be re-opened but pointed at /dev/null rather than closed) while AIX will throw EBADF if you try to write to the closed stream. But the main thing is that trying to attach to fd's 0, 1, and 2 all fail.

@veshboo
Copy link

veshboo commented Sep 17, 2016

fcntl(fd, F_GETFD, 0) instead of fstat. Anyone already tried?

@Trott
Copy link
Member

Trott commented Sep 17, 2016

fcntl(fd, F_GETFD, 0) instead of fstat. Anyone already tried?

Switched the fstat to this: if (fcntl(fd, F_GETFD, 0) != -1)

make test still passes on my OS X machine.

Still fails on AIX: https://ci.nodejs.org/job/node-stress-single-test/918/nodes=aix61-ppc64/console

Given the results from a couple days ago, I don't think any tweak to the fstat() call has a chance of working by itself. Maybe it successfully detects the closed stdio stream and maybe it doesn't, but it won't fix it. The subsequent open() call won't have the desired result, so we'd still have to fix that somehow as well...

@veshboo
Copy link

veshboo commented Sep 19, 2016

There is ksh process in AIX ps result in @gireeshpunathil's comment.
On AIX, /bin/sh seems to be a link to ksh.
How about changing test-stdio-closed.js to use bash?

@Trott
Copy link
Member

Trott commented Sep 19, 2016

How about changing test-stdio-closed.js to use bash?

I'm afraid that appears to have no effect on the test outcome.

https://ci.nodejs.org/job/node-stress-single-test/920/nodes=aix61-ppc64/console

not ok 1 parallel/test-stdio-closed
# 
# assert.js:85
#   throw new assert.AssertionError({
#   ^
# AssertionError: 1 == 42
#     at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/test/parallel/test-stdio-closed.js:25:10)
#     at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-stress-single-test/nodes/aix61-ppc64/test/common.js:416:15)
#     at emitTwo (events.js:106:13)
#     at ChildProcess.emit (events.js:191:7)
#     at Process.ChildProcess._handle.onexit (internal/child_process.js:215:12)
  ---
  duration_ms: 0.279

@mhdawson
Copy link
Member Author

@Trott while I still want us to better figure out if there is a bug on AIX, I think that:

  • it sounds like the issue we see is an issue testing the intended behaviour as opposed to the intended behaviour not being honored. ie the main thing is that trying to attach to fd's 0, 1, and 2 all fail.
  • This is harder to resolve than I'd hoped so will likely take longer to get to the bottom of what is going on.

Given that I'm ok with your suggestion of allowing "AIX throwing EBADF" as a pass for the test on AIX only or disabling the test for AIX with my preference being the former. If we do that we'd still investigate to see if we can come to a more complete understanding/fix.

@gireeshpunathil @bnoordhuis thoughts ?

Trott added a commit to Trott/io.js that referenced this issue Sep 24, 2016
AIX handles closed stdio differently (but still compliant with spec as
far as I can tell) than other POSIX variants we test. Test results are
different than Linux and others because AIX takes measures to not re-use
the file descriptors for stdio if one of the stdio streams is closed.

Fixes: nodejs#8375
MylesBorins pushed a commit that referenced this issue Sep 28, 2016
AIX handles closed stdio differently (but still compliant with spec as
far as I can tell) than other POSIX variants we test. Test results are
different than Linux and others because AIX takes measures to not re-use
the file descriptors for stdio if one of the stdio streams is closed.

Fixes: #8375
PR-URL: #8755
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Ilkka Myller <[email protected]>
jasnell pushed a commit that referenced this issue Sep 29, 2016
AIX handles closed stdio differently (but still compliant with spec as
far as I can tell) than other POSIX variants we test. Test results are
different than Linux and others because AIX takes measures to not re-use
the file descriptors for stdio if one of the stdio streams is closed.

Fixes: #8375
PR-URL: #8755
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Ilkka Myller <[email protected]>
geek pushed a commit to geek/node that referenced this issue Sep 30, 2016
AIX handles closed stdio differently (but still compliant with spec as
far as I can tell) than other POSIX variants we test. Test results are
different than Linux and others because AIX takes measures to not re-use
the file descriptors for stdio if one of the stdio streams is closed.

Fixes: nodejs#8375
PR-URL: nodejs#8755
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Ilkka Myller <[email protected]>
Fishrock123 pushed a commit that referenced this issue Oct 11, 2016
AIX handles closed stdio differently (but still compliant with spec as
far as I can tell) than other POSIX variants we test. Test results are
different than Linux and others because AIX takes measures to not re-use
the file descriptors for stdio if one of the stdio streams is closed.

Fixes: #8375
PR-URL: #8755
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Ilkka Myller <[email protected]>

 Conflicts:
	test/parallel/parallel.status
rvagg pushed a commit that referenced this issue Oct 18, 2016
AIX handles closed stdio differently (but still compliant with spec as
far as I can tell) than other POSIX variants we test. Test results are
different than Linux and others because AIX takes measures to not re-use
the file descriptors for stdio if one of the stdio streams is closed.

Fixes: #8375
PR-URL: #8755
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Ilkka Myller <[email protected]>
MylesBorins pushed a commit that referenced this issue Oct 26, 2016
AIX handles closed stdio differently (but still compliant with spec as
far as I can tell) than other POSIX variants we test. Test results are
different than Linux and others because AIX takes measures to not re-use
the file descriptors for stdio if one of the stdio streams is closed.

Fixes: #8375
PR-URL: #8755
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Ilkka Myller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
process Issues and PRs related to the process subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants