Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aix: enable runtime linking #15286

Closed
wants to merge 1 commit into from
Closed

aix: enable runtime linking #15286

wants to merge 1 commit into from

Conversation

jBarz
Copy link
Contributor

@jBarz jBarz commented Sep 8, 2017

Enable runtime linking of shared objects.

Fixes: #15243

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

Enable runtime linking of shared objects. This will
allow loading of symbols using the RTLD_GLOBAL flag.
@nodejs-github-bot nodejs-github-bot added the build Issues and PRs related to build files or the CI. label Sep 8, 2017
@jBarz
Copy link
Contributor Author

jBarz commented Sep 8, 2017

@@ -13,7 +13,7 @@
# Pass erok flag to the linker, to prevent unresolved symbols
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get rid of the comment too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@gibfahn gibfahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass

Copy link
Contributor

@ezequielgarcia ezequielgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibfahn
Copy link
Member

gibfahn commented Sep 8, 2017

CI: https://ci.nodejs.org/job/node-test-commit/12260/

EDIT: CI is green

Also cc/ @shellberg @nodejs/platform-aix

@mscdex mscdex added the aix Issues and PRs related to the AIX platform. label Sep 8, 2017
@richardlau
Copy link
Member

I'm not objecting to this but I have a vague recollection that when we did the original port of Node.js to AIX enabling run-time linking was considered but the AIX folks advised that using it on AIX was slower. Might be worth running some benchmarks to check.

Anecdotally, a quick web search brings up this from the ICU mailing list: https://sourceforge.net/p/icu/mailman/message/7934586/

@richardlau
Copy link
Member

Oh, and this is as good a place as any to reference AIX Linking and Loading Mechanisms.

@gireeshpunathil
Copy link
Member

Based on my understanding of AIX runtime linking:
if B loads C (B and C are shared objects),
and if B wishes to re-bind a symbol that was defined by C
then both B and C needs to be built with runtime-linking (-brtl)

if A is the main executable that loads B, but does not wish to take part in symbol re-binding, A does not need to build with -brtl

In the current context, A is node, B is binding.so, C is ping.so

So should we be linking node with -brtl, as opposed to binding.so / ping.so? Building node with -brtl may have far reaching implications, so verifying this with small test case would be great.

The original intent of PR #12794 is to bring ability to enforce symbols from a given library to resolutions in subsequent loads. It is not an objective to rebind already resoloved symbols from already loaded modules. @ezequielgarcia - can you please clarify?

@jBarz
Copy link
Contributor Author

jBarz commented Sep 11, 2017

I found the following quote in : AIX Linking and Loading Mechanisms.:
"Note that it is the main application that has to be built to enable run-time linking."

@ezequielgarcia
Copy link
Contributor

@gireeshpunathil AFAIU, there is no re-bind involved in use cases targeted by #12794.

The ability to pass dlopen flags is explained by the dlopen man itself:

RTLD_GLOBAL
The symbols defined by this library will be made available for symbol resolution of subsequently loaded libraries.

So, taking your example where: node -> binding.node -> ping.so. If binding.node is dlopen'ed with RTLD_GLOBAL, then dlopen_pong symbol is available (in binding.node) for ping.so to runtime-link them. Note that ping.so does not bind the symbol at linker-time. If you inspect the ping.so file, the symbol will be unresolved. It is resolved at runtime-link time, if available.

In other words (and hoping this clarifies the issue a little bit), when you say:

It is not an objective to rebind already resoloved symbols from already loaded modules

The dlopen_pong symbol not already resolved, until the library is dlopen-loaded by binding.node.

Makes sense?

@gireeshpunathil
Copy link
Member

@jBarz - thanks, I have seen that, and my interpretation of it is that the main module needs runtime linking because it is the main module that wishes to re-bind / avail rebound symbols.

@ezequielgarcia - thanks, in the test case - yes, there is no re-binding, as there was no original binding in the first place (the symbol remained deffered and unresolved until it was made globally available). But is it the same / only way you foresee the use case for the PR? I thought you also want to have (i) visibility of symbols outside of the module, (ii) override / pre-set symbols over future symbols. Please confirm.

@ezequielgarcia
Copy link
Contributor

@gireeshpunathil Yes, the test shows the use-case I had in mind for the PR. It is the case of libao plugins, and also seems the use cases people were targetting:

nodejs/node-v0.x-archive#436
https://groups.google.com/forum/#!topic/nodejs/L2wNb_azzyU

I.e. symbols not available at link-time, and remaining unresolved until a previousy loaded library exports them through proper dlopen flags.

@gireeshpunathil
Copy link
Member

@ezequielgarcia - thanks for the clarification. So the question is: Do we need -brtl for node executable itself, or only for the modules that wish to avail runtime symbol resolution? Let me also try to get an answer to this.

@gireeshpunathil
Copy link
Member

Did some testing and found that the main module requires the runtime linking option, without which the deferred symbols never get resolved.

bash$ cat ping.c

extern const char* dlopen_pong();
const char* dlopen_ping(void) {
  return (const char*) dlopen_pong();
}

bash$ cat binding.c

#include <dlfcn.h>
#include <stdio.h>
const char* dlopen_pong(void) {
  return "pong";
}
typedef const char* (*ping)(void);
void run() {
  ping ping_func;
  void *handle = dlopen("./ping.so", RTLD_LAZY);
  ping_func = (const char* (*)()) dlsym(handle, "dlopen_ping");
  fprintf(stderr, "result: %s\n", ping_func());
}

bash$ cat node.c

#include <stdio.h>
#include <dlfcn.h>
typedef void (*run_func)();
int main() {
  run_func run;
  void *handle = dlopen("./binding.so", RTLD_GLOBAL | RTLD_NOW);
  run = (void (*)()) dlsym(handle, "run");
  run();
}
bash$  gcc ping.c -g -shared -Wl,-berok -o ping.so
bash$  gcc binding.c -g -shared -Wl,-brtl -o binding.so
bash$  gcc -Wl,-brtl node.c
bash$  ./a.out
result: pong
(gdb)
Breakpoint 1, dlopen_ping () at ping.c:4
(gdb) where
#0  dlopen_ping () at ping.c:4
#1  0xd455753c in run () at binding.c:15
#2  0x10000438 in main () at node.c:10
4         return (const char*) dlopen_pong();
(gdb) x/10i dlopen_ping
   0xd4558380 <dlopen_ping>:    lwz     r12,24(r2)
   0xd4558384 <dlopen_ping+4>:  stw     r2,20(r1)
   0xd4558388 <dlopen_ping+8>:  lwz     r0,0(r12)
   0xd455838c <dlopen_ping+12>: lwz     r2,4(r12)
   0xd4558390 <dlopen_ping+16>: mtctr   r0
=> 0xd4558394 <dlopen_ping+20>: bctr
   0xd4558398 <dlopen_ping+24>: .long 0x0
   0xd455839c <dlopen_ping+28>: .long 0xca000
   0xd45583a0 <dlopen_ping+32>: .long 0x0
   0xd45583a4 <dlopen_ping+36>: .long 0x18
(gdb) i r r0
r0             0xd45574a0       3562370208
(gdb) info symbol 0xd45574a0
dlopen_pong + 40 in section .text of ./binding.so
(gdb) 

Without -brtl flag for the main module:

bash$ gcc -g node.c
bash$  ./a.out
Illegal instruction (core dumped)
bash$
(gdb) r
Starting program: /home/gireesh/dl/a.out 

Breakpoint 1, dlopen_ping () at ping.c:4
4         return (const char*) dlopen_pong();
(gdb) x/10i dlopen_ping
   0xd4558380 <dlopen_ping>:    lwz     r12,24(r2)
   0xd4558384 <dlopen_ping+4>:  stw     r2,20(r1)
   0xd4558388 <dlopen_ping+8>:  lwz     r0,0(r12)
   0xd455838c <dlopen_ping+12>: lwz     r2,4(r12)
   0xd4558390 <dlopen_ping+16>: mtctr   r0
=> 0xd4558394 <dlopen_ping+20>: bctr
   0xd4558398 <dlopen_ping+24>: .long 0x0
   0xd455839c <dlopen_ping+28>: .long 0xca000
   0xd45583a0 <dlopen_ping+32>: .long 0x0
   0xd45583a4 <dlopen_ping+36>: .long 0x18
(gdb) info symbol $r0
No symbol matches $r0.
(gdb) i r r0
r0             0x0      0
(gdb) stepi

Program received signal SIGILL, Illegal instruction.
0x00000000 in ?? ()
(gdb) 

So John you are right, and the changes looks good to me.

@jBarz
Copy link
Contributor Author

jBarz commented Sep 13, 2017

Ahh, thanks for confirming!

@refack
Copy link
Contributor

refack commented Sep 13, 2017

Did some testing and found that the main module requires the runtime linking option, without which the deferred symbols never get resolved.

So should we be linking node with -brtl, as opposed to binding.so / ping.so? Building node with -brtl may have far reaching implications, so verifying this with small test case would be great.

Are we happy with linking the node binary with -brtl?
Any performance/security/stability concerns?

@shellberg
Copy link

This proposal would also either help... or mask #14785!

@gireeshpunathil
Copy link
Member

Performance: Need to test and see, but I don't foresee anything significant, as once the libraries bootstrap, these linkage definitions / resolutions do not come into play often in the run time of the application. The symbol binding effort at runtime is merely a function of the number of (unresolved) symbols itself.

@jBarz - will you please run some benchmarks with -brtl ON?

Stability: As node is not using -berok (deferred symbols), this does not cause any stability issues to node executable. At the same time, addons appearing with deferred symbols where a definition is not available and linked with -berok can cause SIGILL if invoked. But two points:

  1. This is exactly the functional requirement of PR Add process.dlopenFlags #12794 , and any potential destability is not attributed to -brtl.
  2. This behavior should be at par with other platforms as well, nothing specific to AIX.

Security: This is the implication I was talking about. I am testing further, and will get back soon.

@gireeshpunathil
Copy link
Member

Here are couple of observations:

(i) symbol re-definition (re-rlinking) happens based on the order of library linkages.

bash$ cat p.c

#include <stdio.h>

void self() {
  fprintf(stderr, "p\n");
}
int main() {
  void (*ptr)();
  ptr = self;
  ptr();
}

bash$ cat q.c

#include <stdio.h>

void self() {
  fprintf(stderr, "q\n");
}

bash$ cat r.c

#include <stdio.h>

void self() {
  fprintf(stderr, "r\n");
}
bash$  gcc -Wl,-G -c p.c -o p.o
bash$  gcc -Wl,-G -c q.c -o libq.so
bash$  gcc -Wl,-G -c r.c -o libr.so

bash$  gcc -Wl,-G p.o -lr -lq -L.
bash$  ./a.out
p
bash$  
bash$  gcc -Wl,-G -lq -lr -L. p.o
bash$  ./a.out
q
bash$  gcc -Wl,-G -lr -lq -L. p.o
bash$  ./a.out
r

(ii) however, re-linking does not occur to symbols of existing image:

bash$ cat node.c

#include <stdio.h>
#include <dlfcn.h>

void self() {
  fprintf(stderr, "node\n");
}
int main() {
  ((void (*)()) dlsym(dlopen("./binding.so", RTLD_GLOBAL | RTLD_NOW), "run"))(); 
  void (*ptr)();
  ptr = self;
  ptr();
}

bash$ cat binding.c

#include <stdio.h>
#include <dlfcn.h>

const char* dlopen_pong(void) {
  return "pong";
}

void self() {
  fprintf(stderr, "binding\n");
}

void run() {
  ((const char* (*)()) dlsym(dlopen("./ping.so", RTLD_LAZY), "dlopen_ping"))();
  void (*ptr)();
  ptr = self;
  ptr();
}

bash$ cat ping.c

#include <stdio.h>

extern const char* dlopen_pong();

void self() {
  fprintf(stderr, "ping\n");
}
const char* dlopen_ping(void) {
  void (*ptr)();
  ptr = self;
  ptr();
  return (const char*) dlopen_pong();
}
bash$  gcc ping.c -g -shared -Wl,-brtl -Wl,-berok -o ping.so
bash$  gcc binding.c -g -shared -Wl,-brtl -o binding.so
bash$  gcc -g -Wl,-brtl node.c
bash$  ./a.out
ping
binding
node
bash$  

So even if we take the case of untrusted native addons get loaded into the address space, the symbols in the node.exe cannot be over-ridden (with an exception of deferred symbols resident in the image).

My assertion is that this is a safe approach, but would be good to have a tsc contemplation, as this brings in changes to the linkage model.

@refack refack added the semver-major PRs that contain breaking changes and should be released in the next major version. label Sep 15, 2017
@refack
Copy link
Contributor

refack commented Sep 15, 2017

My assertion is that this is a safe approach, but would be good to have a tsc contemplation, as this brings in changes to the linkage model.

This is not worse than DLL hijacking or DLL injection in Windows, so I won't consider this a regression.
I'm marking this semver-major defensively (+ this depends on #12794 which is major)
@nodejs/tsc @nodejs/security PTAL

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BridgeAR
Copy link
Member

Since this is now semver-major it needs another LG @nodejs/tsc

BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Sep 20, 2017
Enable runtime linking of shared objects. This will
allow loading of symbols using the RTLD_GLOBAL flag.

PR-URL: nodejs#15286
Fixes: nodejs#15243
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@BridgeAR
Copy link
Member

Landed in c5eb5bf

@BridgeAR BridgeAR closed this Sep 20, 2017
Qard pushed a commit to Qard/ayo that referenced this pull request Sep 21, 2017
Enable runtime linking of shared objects. This will
allow loading of symbols using the RTLD_GLOBAL flag.

PR-URL: nodejs/node#15286
Fixes: nodejs/node#15243
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Qard pushed a commit to Qard/ayo that referenced this pull request Sep 21, 2017
Enable runtime linking of shared objects. This will
allow loading of symbols using the RTLD_GLOBAL flag.

PR-URL: nodejs/node#15286
Fixes: nodejs/node#15243
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@jBarz jBarz deleted the master.aix branch October 4, 2017 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aix Issues and PRs related to the AIX platform. build Issues and PRs related to build files or the CI. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix / enable test/addons/dlopen-ping-pong/test.js for AIX