Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting Connection to a Scoped IPv6 Address Causes Main Thread to Hang #72

Open
amydevs opened this issue Oct 23, 2023 · 0 comments
Open
Labels
bug Something isn't working r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@amydevs
Copy link
Member

amydevs commented Oct 23, 2023

Describe the bug

Attempting connection to a scoped IPv6 address causes main thread to hang. This was discovered in the process of integrating MDNS into Polykey, where MDNS would provide valid IPv6 link-local addresses for the NodeConnectionManager to establish connections with. These addresses would look like fe80::e5ab:1462:fb26:79c%enp0s31f6. This would be passed into NodeConnectionManager.establishSingleConnection, which would call NodeConnection.createNodeConnection, which would call and await QUICClient.createQUICClient. Logging before and after this call shows that the call is made, but the Promise was never resolved. It is also indicative that it is freezing the main thread, as the nodesConnectionConnectTimeoutTime is not respected, and the Jest runner is not automatically aborting the long-running test.

This behaviour persists with any IPv6 scoped address, not just link-local addresses, such as ::1%lo. Note that specifying the scope/nic id is optional on non-link local ipv6 addresses, but still valid.

Usually routers will always assign a global IPv6 address, so IPv6 link-local addresses are meant for local network applications are can be used in absense of router. So this might be an edge case.

To Reproduce

Base Case:

test('to ipv6 link-local server succeeds', async () => {
      const connectionEventProm = promise<events.EventQUICServerConnection>();
      const tlsConfigServer = await testsUtils.generateTLSConfig(defaultType);
      const server = new QUICServer({
        crypto: {
          key,
          ops: serverCryptoOps,
        },
        logger: logger.getChild(QUICServer.name),
        config: {
          key: tlsConfigServer.leafKeyPairPEM.privateKey,
          cert: tlsConfigServer.leafCertPEM,
          verifyPeer: false,
        },
      });
      socketCleanMethods.extractSocket(server);
      server.addEventListener(
        events.EventQUICServerConnection.name,
        (e: events.EventQUICServerConnection) =>
          connectionEventProm.resolveP(e),
      );
      await server.start({
        host: '::1',
      });
      const client = await QUICClient.createQUICClient({
        host: '::1%lo',
        port: server.port,
        localHost: '::',
        crypto: {
          ops: clientCryptoOps,
        },
        logger: logger.getChild(QUICClient.name),
        config: {
          verifyPeer: false,
        },
      });
      socketCleanMethods.extractSocket(client);
      const conn = (await connectionEventProm.p).detail;
      expect(conn.localHost).toBe('::1');
      expect(conn.localPort).toBe(server.port);
      expect(conn.remoteHost).toBe('::1');
      expect(conn.remotePort).toBe(client.localPort);
      await client.destroy();
      await server.stop();
    });

Proof of Thread Hanging (console.log('hi') is never ran):

test('to ipv6 link-local server succeeds', async () => {
      const connectionEventProm = promise<events.EventQUICServerConnection>();
      const tlsConfigServer = await testsUtils.generateTLSConfig(defaultType);
      const server = new QUICServer({
        crypto: {
          key,
          ops: serverCryptoOps,
        },
        logger: logger.getChild(QUICServer.name),
        config: {
          key: tlsConfigServer.leafKeyPairPEM.privateKey,
          cert: tlsConfigServer.leafCertPEM,
          verifyPeer: false,
        },
      });
      socketCleanMethods.extractSocket(server);
      server.addEventListener(
        events.EventQUICServerConnection.name,
        (e: events.EventQUICServerConnection) =>
          connectionEventProm.resolveP(e),
      );
      await server.start({
        host: '::1',
      });
      setTimeout(() => {console.log("hi")}, 2000)
      const client = await QUICClient.createQUICClient({
        host: '::1%lo',
        port: server.port,
        localHost: '::',
        crypto: {
          ops: clientCryptoOps,
        },
        logger: logger.getChild(QUICClient.name),
        config: {
          verifyPeer: false,
        },
      });
      socketCleanMethods.extractSocket(client);
      const conn = (await connectionEventProm.p).detail;
      expect(conn.localHost).toBe('::1');
      expect(conn.localPort).toBe(server.port);
      expect(conn.remoteHost).toBe('::1');
      expect(conn.remotePort).toBe(client.localPort);
      await client.destroy();
      await server.stop();
    });

Expected behavior

The QUICClient.createQUICClient should resolve, with a QUICConnection to the IPv6 scoped address.

Screenshots

In one case, this has caused Polykey to be aborted due to OOM.

[nix-shell:~/git/Polykey2]$ npm run test -- ./tests/nodes/NodeConnectionManager.mdns.test.ts -t 'created'

> [email protected] test
> node --expose-gc ./node_modules/.bin/jest ./tests/nodes/NodeConnectionManager.mdns.test.ts -t created

Determining test suites to run...
GLOBAL SETUP
Global Data Dir: /tmp/polykey-test-global-X7iQkB
(node:326061) [DEP0112] DeprecationWarning: Socket.prototype._handle is deprecated
(Use `node --trace-deprecation ...` to show where the warning was created)

 RUNS  tests/nodes/NodeConnectionManager.mdns.test.ts

<--- Last few GCs --->

[326061:0x37b25c0]    96560 ms: Scavenge 2041.9 (2080.6) -> 2040.1 (2080.6) MB, 3.35 / 0.00 ms  (average mu = 0.253, current mu = 0.200) allocation failure; 
[326061:0x37b25c0]    96565 ms: Scavenge 2042.0 (2080.6) -> 2040.1 (2080.6) MB, 3.34 / 0.00 ms  (average mu = 0.253, current mu = 0.200) allocation failure; 
[326061:0x37b25c0]    96571 ms: Scavenge 2042.0 (2080.6) -> 2040.2 (2084.6) MB, 3.62 / 0.00 ms  (average mu = 0.253, current mu = 0.200) allocation failure; 


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xbd40d8 node::Abort() [node]
 2: 0xa9c264  [node]
 3: 0xdf33d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0xdf37a4 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 5: 0x100f857  [node]
 6: 0x100f8db  [node]
 7: 0x102515d v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node]
 8: 0x1025c50 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0x102647c v8::internal::Heap::CollectAllGarbage(int, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
10: 0xf95b78 v8::internal::StackGuard::HandleInterrupts() [node]
11: 0x1408336 v8::internal::NativeRegExpMacroAssembler::CheckStackGuardState(v8::internal::Isolate*, int, v8::internal::RegExp::CallOrigin, unsigned long*, v8::internal::InstructionStream, unsigned long*, unsigned char const**, unsigned char const**) [node]
12: 0x173f7bb v8::internal::RegExpMacroAssemblerX64::CheckStackGuardState(unsigned long*, unsigned long, unsigned long) [node]
13: 0x7f3c3c076396 
Aborted (core dumped)

Platform (please complete the following information)

  • Device: Dell Precision 3470
  • OS: NixOS
  • Version: Node v20.5.1

Additional context

Notify maintainers

@tegefaulkes

@amydevs amydevs added the bug Something isn't working label Oct 23, 2023
@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

No branches or pull requests

2 participants